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An  observability  problem  for  both  deterministic  and  stochastic 
systems  is  studied  here. 

Deterministic  observability  is  a  determination  of  whether  every 
state  of  the  system  is  connected  to  the  observation  mechanism  and  hew 
it  is  connected,  if  connected.  On  the  other  hand,  stochastic 
observability  discusses  the  "tightness"  of  the  connection  in  terms  of 
the  chosen  statistical  sense. 

For  the  deterministic  system  observability  two  conditions, 
connectedness  and  uni  valence,  are  obtained  from  modification  of  the 
global  implicit-function  theorem.  Depending  on  hew  the  conditions  are 
satisfied  observability  is  classified  in  three  categories; 
observability  in  the  strict  sense,  observability  in  the  wide  sense  and 
the  unobservable  case. 

Twu  underwater  tracking  examples,  the  bearing-only-target  (BOT) 
problem  described  in  the  mixed-coordinate  system,  and  an  array  SONAR 
problem  described  in  terms  of  a  small  number  of  sensors  and  various 
measurement  policies  are  analyzed. 

For  the  stochastic  system  observability,  an  information  theoretic 
approach  is  introduced.  The  Shannon  concepts  of  information  are 
considered  instead  of  Fisher  information.  Computed  here  is  the  mutual 
information  between  the  state  and  the  observation.  Since  this 
quantity  is  expressed  as  an  entropy  difference  between  a  priori  and  a 
posteriori  processes,  two  densities  are  required  for  computation.  Due 


to  the  difficulty  in  solving  the  density  equation,  the  second  moment 
approximation  of  the  densities  are  considered  here.  Then,  the  mutual 
information  is  used  as  a  criterion  to  determine  the  "degree  of 


observability. " 

Information  sensitivity  with  respect  to  various  coordinate 
systems,  including  rectangular,  modified  polar  and  mixed  coordinates 
are  analyzed  for  the  BOT  system.  In  an  array  SONAR,  a  combination  of 
relative  delay  and  Doppler  measurements  for  up  to  three  sensors  are 
compared. 
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OBSERVABILITY  AND  INFORMATION  STRUCTURE 
OF  NONLINEAR  SYSTEMS 

CHAPTER  1:  INTRODUCTION 

A  state  space  description  is  one  way  widely  used  to  describe  a 
physical  dynamic  system  in  a  mathematical  model.  Here  every 
individual  state  represents  seme  property  of  the  actual  system 
characteristics.  So,  to  understand  the  nature  of  the  system  from 
outside  the  dynamic  model,  one  is  required  to  observe  or  measure 
necessary  states.  But,  sometimes,  it  is  not  possible  to  access  and 
measure  all  of  the  necessary  states  from  the  outside.  Even  in  case  of 
such  possibility,  it  may  be  too  expensive  economically  to  measure 
specific  states.  In  this  case,  one  thinks  about  another  indirect  way 
instead  of  direct  measuring  at  high  cost  or  unmeasurable  states,  i.e. , 
if  one  can  somehow  reconstruct  every  necessary  state  by  utilization  of 
less  expensive  or  measurable  states  only,  then  one  might  be  satisfied. 
Observability  is  a  basic  system  study  relevant  to  this  subject.  One 
is  interested,  here,  in  determination  of  whether  measured  data  is 
enough  to  reconstruct  all  of  the  states.  Importance  of  system 
observability  stems  from  another  aspect.  I.e.,  if  the  system  is  not 
observable  for  some  reason,  then  certain  states  which  are  estimated 
from  this  insufficient  information  may  be  inaccurate  and  thus  any 
further  action,  for  example,  feedback  control  which  is  evaluated  based 
on  inaccurate  states  may  exhibit  undesirable  performance. 


If  noise  is  involved  in  the  description  of  system  and/or 
measurement  dynamics  then  the  observability  concept  is  changed  from 
the  above  deterministic  case.  Here,  one  is  more  interested  in  "how 
much"  the  system  is  observable  in  terms  of  a  chosen  probabilistic 
sense,  i.e.,  degree  of  observability  rather  than  a  "yes"  or  "no"  type 
answer.  Of  course,  there  are  many  different  ways  to  measure  the 
degree  of  observability.  Apparently,  one  way  is  using  information 
theory.  Here,  evaluated  is  the  quantity  of  common  information,  so 
called,  mutual  information  between  the  state  x^  and  the  observation 
yt<  and  this  quantity  is  used  as  a  criterion  to  determine  the  degree 
of  observability,  i.e.,  a  calculation  is  made  of  the  amount  of 
information  about  state  x^  which  is  contained  in  the  observation  y  . 

In  Chapter  Two,  deterministic  observability  is  studied.  After 
defining  the  problem,  observability  criteria  for  linear  systems  and 
former  results  for  nonlinear  systems  are  summarized.  Since,  nonlinear 
observability  is  a  geometric  functional  structure  problem,  a 
functional  analytic  approach  is  used.  A  modified  version  of  the 
global  implicit  function  theorem  is  obtained  from  the  result  of  Palais 
[1].  To  apply  the  modified  version  of  this  theorem  in  the  nonlinear 
observability  problem,  appropriate  algebraic  modification  of  the 
observation  equation  is  required.  Thus  two  conditions,  connectedness 
and  uni valence,  are  derived.  Depending  on  how  the  conditions  are 
satisfied,  observability  is  classified  in  three  categories; 
observability  in  the  strict  sense,  observability  in  the  wide  sense  and 
the  unobservable  case.  Two  important  applicaticnal  examples  are 


BOT  tracking  which  is  described  in 


analyzed  using  the  result.  I.e., 
the  mixed-coordinate  system,  and  an  array  SONAR  with  a  small  number  of 
sensors  and  with  various  measurement  policies  are  analyzed. 

In  Chapter  Three,  stochastic-system  observability  is  studied 
using  an  informat ion- theoretical  approach .  The  term  "information"  is 
interpreted  in  the  Shannon  sense  rather  than  the  Fisher  sense  here. 
So,  information  is  not  an  abstract  quantity  but  a  substantial  quantity 
having  appropriate  units.  With  the  basic  definitions  of  information 
and  entropy  concepts,  mutual  information  is  introduced  and  expressed 
in  terms  of  entropy  difference,  i.e.,  difference  between  unconditional 
and  conditional  entropies.  Since  the  evaluation  of  the  mutual 
information  of  stochastic  processes  requires  more  conditions  than 
simple  random  variables  that  is  introduced  using  measure  theory. 
Under  the  proper  conditions,  entropy  is  expressed  in  terms  of 
estimation  covariances.  Therefore,  the  mutual  information  can  be 
obtained  from  two  covariances  -  unconditional  and  conditional 
covariances.  Both  can  be  obtained  from  an  adopted  filter  algorithm. 
But  the  non-Gaussian  case  generally  requires  knowledge  of  the 

probability  distribution  or  higher  order  moments.  Here  the  second 
moment  approximations  of  the  densities  are  considered. 

A  brief  discussion  on  the  relationship  between  deterministic  and 
stochastic  observability  follows.  A  result  on  the  relationship 
between  the  Fisher  information  and  Shannon's  mutual  information  is 


discussed. 


Chapter  Four  shows  simulation  results  of  various  practical 
problems  in  view  of  observability  and  information  structure.  Followed 
by  a  simple  1 inear-system  example  is  BOT  tracking  and  array  SONAR 
problems  which  are  analyzed  in  Chapter  two. 

Information  structures  of  observable  and  unobservable  cases  for 
all  examples  are  compared  with  various  parameter  changes.  Estimation 
error  analysis  in  terms  of  the  contents  of  information  is  shown. 
Chapter  Five  summarizes  the  results. 


Notation 


The  following  notations  will  be  used  throughout: 


R11 


trA 

A* 

A(n)(t) 

_3_ 

lx 


Euclidean  n-dimensional  space 
Euclidean  norm 
Trace  of  a  matrix  A 

Conjugate  transpose  of  matrix  or  vector  A 
(A*  will  be  used  when  A  is  real) 
n-th  time  derivative  of  A(t) 

Gradient  vector  of  nonanticipative  functionals 


2 

3 

- -  Jacobian  matrix  of  nonanticipative  functionals 

>  x  3x  'L 

<<  Absolute  continuity  or  negligibly  small 

Equii valence  or  approximated  quantity 

fJ,F,y)  Complete  measure  space 

Complete  probability  space 

F  Sub  -  a  -  algebra  of  F 

| . |  Absolute  value 

x(t)  Denotes  dx 

dt 

x(t)  Deterministic  time  variable  of  vector  x. 

x  (t)  Scalar  quantity  of  x(t) 

x^_  Stochastic  time  variable  of  vector  x  for 

elementary  event  w  e  Q 


particular 


Notation  ( cont . 


Scalar  quantity  of  x^ 

{x^}  Stochastic  vector  process 

E[x^]  Expectation  of  x^ 

E[xt  |y.  ]  Conditional  expectation  with  respect  to  a  given  measurement 
yt 

ECx^IF^]  Conditional  expection  with  respect  to  a  given  sub- a-  algebra 
generated  by  {ys,0<s<t} 


C1  Space  of  continuous  functions 

[a,b]  Closed  interval 

(a,b)  Open  interval 

i  eA  a  is  an  element  of  A 

P  Covariance  matrix 

p  Probability  distribution  (probability  density  function  when 

not  confused  with  distribution) 
p  Probability  density  function 

**  End  of  proof 

fIU  f  is  restricted  by  U 
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CHAPTER  2:  OBSERVABILITY  OF  DETERMINISTIC  NONLINEAR  SYSTEMS 

2-1  The  observability  problem  and  former  results. 

Consider  a  mathematical  description  of  physical  dynamic  system 
which  is  expressed  in  the  first-order  vector  differential  equation 


dx(t) 

-  =  f(x(t) ,u(t) ,t) , 

dt 


(2-1) 


where  x(t)  is  an  n-dimensional  state  vector,  u(t)  is  an  r-dimensional 
control  input,  and  t  is  the  time  variable.  Assume  the  dynamic 
property  of  the  system  is  known,  i.e.,  an  n-vector  valued  function 
f(.)  and  u(t)  is  known  for  t>t  .  Further  assume  that  f{.) 
satisfies  the  existence  and  uniquness  conditions  of  the  x(t),  i.e., 


1.  f(.)  is  continuous  in  t  and  once  continuously  differentiable  in  x 
and  u  for  fixed  t,  t  e  [0,  00 ) . 

2.  f(.)  satisfies  uniform  Lipschitz  condition  on  x. 

||f(x1(t),.)-f(x2(t),.)||  <  M  | i  x1 ( t)  -x2(t)||,  (2-2) 

where  1 1 .  |  |  is  the  Euclidean  norm,  M  is  a  bounded  real  positive 
constant.  Under  the  above  conditions  one  wants  to  know  the  time 
trajectory  of  x(t)  from  (2-1).  For  this  purpose  one  constructs  an 
integral  operator  g( . )  such  that 


x(t)=g(x(tQ) , u( t ) , t ) . 


(2-3) 


But  knowing  the  operator  g{.)  does  not  mean  that,  actually,  one  can 
get  the  solution  trajectory  x(t)  of  (2-1)  because  the  initial  state 


x(t  )  in  (2-3)  is  not  known.  So,  if  one  can  somehow  establish  x(t  ), 

'  o  o 

then  the  problem  will  be  solved.  To  establish  the  initial  state 
x(tQ) ,  in  practice,  one  might  construct  another  equation  known  as  a 
"measurement"  or  "observation"  equation  since  there  is  no  way  to  know 
x(tQ)  from  the  system  model  equation  (2-1)  in  itself.  Using 
appropriate  measuring  or  observing  devices,  necessary  state  variables 
or  other  variables  are  observed  for  some  period  of  time,  say  [t  , t  ]. 
Then  using  the  observed  data,  x(tQ)  might  be  determined  indirectly. 
This  obervation  mechanism  might  be  modelled  mathematically  as 

y(t)  =  h(x(t) ,t) ,  (2-4) 

where  h( . )  is  an  m-dimensional  vector  function  and  yeR™.  Here  m  is 
not  necessarily  the  same  as  n.  Usually  from  the  physical  availability 
an!  economic  point  of  view,  m  is  less  than  n. 

If  (2-4)  is  uniquely  solvable  for  x(t),  then  every  state  x..(t), 
i=l,2, . . . ,n  can  be  computed  with  only  currently  measured  y(t) ,  i.e. , 
the  information  measured  is  in  a  sense  complete.  But  if  observed 
information  is  incomplete,  i.e.,  (2-4)  is  not  uniquely  solvable  for 
x(t),  then  there  arises  the  problem  of  evaluating  the  state  x(t)  by 
some  indirect  method  using  state  equation  (2-1)  as  well  as  observation 
equation  (2-4) . 


The  observability  problem  has  been  well  investigated  and  the 
result  is  clear  for  the  linear  system  where  the  test  of  nonsingularity 
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of  the  observability  matrix  or  equivalently  rank  test  is  enough. 

But  for  the  general  nonlinear  system  these  techniques  are  not 
applicable,  unfortunately,  since  even  in  case  of  nonsingular  or  full 
rank  condition  of  the  observability  matrix,  one  cannot  solve  uniquely 
x(t)  from  (2-1)  and  (2-4).  Thus  x(tQ)  can  not  be  determined  uniquely. 
Before  investigating  this  problem  further, a  summary  of  the  former 
results  are  made. 


2.1.1  Former  results  on  system  observability 


1 .  Linear 


Consider  the  time-varying  linear  system 


x(t)  =  A(t)x(t)+B(t)u(t) , 
y(t)  =  C(t)x(t)+D(t)u(t) , 


(2-5) 

(2-6) 


where  matrices  A(t),  B(t),  C(t),  D(t)  are  known  n  x  n,  n  x  r,  m  x  n, 
m  x  r,  respectively  and  entries  are  continuous  in  t  over  (-«,<»  ) . 
Observability  of  the  system  (2-5),  (2-6)  is  dealt  with  in  the  most 
standard  textbooks  [2],  [3]. 

First  define  the  observability  of  the  linear  system  (2-5),  (2-6) 
as  follows: 


c 


Definition  r  3 1 

The  system  (2-5),  (2-6)  is  completely  observable  at  t  if  for  any 

x(tQ),  there  exists  a  finite  tj>t  such  that  the  knowledge  of  u(t)  and 


y(t),  te  [tQ, t1 J  is  sufficient  to  determine  x( t  ). 

From  solution  of  (2-5),  y(t)  of  (2-6)  becomes 

t 

y(t)  =  C(t)<|)(t,to)x(to)+C(t)  /<J)(t,s)B(s)u(s)ds  +  D(t)u(t),  (2-7) 

where  <$>(.)  is  the  transition  matrix  of  the  homogeneous  part  of  (2-5). 
From  (2-7)  observability  criterion  is  derived  as  [2]; 

Criterion  1 

The  system  (2-5),  (2-6)  is  observable  at  t  if  and  only  if  the 

columns  of  the  m  x  n  matrix  function  C(t)(j>(t,tQ)  are  linearly 
independent  on  [ tQ , t 1 ] . 

By  multiplying  <t>*(t,to)C*(t) ,  integrating  from  tQ  to  t  and 
retaining  the  zero  input  response  of  (2-7),  Criterion  2  is  obtained. 

Criterion  2 

The  system  (2-5),  (2-6)  is  observable  at  tQ  if  and  only  if  the 

Grammian  matrix  N( . ) 

t 

N(tQ,t)=  /<t>*(s,to)C*(s)C(s)<t>(s,to)ds  (2-8) 

^o 

is  nonsingular. 

Another  criterion  which  is  more  convenient  to  apply  can  be 
derived  from  Criterion  1,  i.e., 


V  (t)  =  [F*(t)|F(1,*(t)|....|F(n_1,*(t)].  (2-10) 


has  rank  n.  Thus  we  have  the  third  criterion. 


Criterion  3 


System  is  observable  at  t  if  and  only  if  there  exists  a  t  e [tQ,t1] 
such  that  observability  matrix 


V*(t)  = 


'  Q0(t) 
Qx{t) 


(2-11) 


l  Vi(t)  J 

has  rank  n,  where 

d 

Qk+l(t)  =  Qk(t)A(t)  +  —cyt),  . n~1'  <2~12) 

Q0(t)  =  C(t)  . 

For  the  time-invariant  linear  case  the  following  observability 
conditions  are  equivalent.  The  time-invariant  linear  system  is, also, 
observable  at  t  in  [o,  ®  }  if  one  of  the  following  conditions  is 


satisfied, 


1)  The  columns  of  ce  are  linearly  independent  on  [0,  «  ) 


2)  The  columns  of  C(SI-A)  1  are  linearly  independent.  S  is  r.api»r*> 
transform  parameter. 

3)  N(t  ,t)  =  /V* (s~to)C*CeA(s~to)ds, 

*o 

is  nonsingular  for  any  t  >o  and  t>t  . 

o  o 

4)  The  mn  x  n  observability  matrix 


V*  = 


r  c 


CA 


CA 


CA 


n-1 


has  rank  n. 


(2-13) 


2.  Nonlinear  system. 

As  known,  the  observability  property  of  the  general  nonlinear 
system  is  not  a  global  property, i.e.  ,an  observable  nonlinear  system  in 
one  time  interval  or  one  portion  of  state  space  may  be  unobservable  in 
a  different  interval.  In  a  geometric  sense,  a  functional  relation 
between  measurement  space  and  state  space  might  not  be  in  one-to-one 
correspondence  such  that  the  inverse  function  between  the  two  spaces 
is  not  uniquely  defined  globally  even  though  it  is  so  defined  locally. 


*  >-*  *>  r  k  ? 


-<  i 't.  . TH33I  - _ _ _ _ _  „ 
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Various  authors  have  studied  the  nonlinear  observability  problem 
in  many  ways.  Extension  of  the  linear  system  observability  criteria 
to  the  nonlinear  case  is  attempted  in  [4],  [5].  The  observability 

rank  condition  using  Lie  algebra  [6],  [7],  [8]  and  Taylor  series 

expansion  [9]  are  reviewed.  As  the  observability  problem  is, 
sometimes,  called  "an  inverse  problem,"  the  inverse  function  theorem 
in  analysis  is  used  widely.  In  this  approach  the  Jacobian  matrix  of 
the  function  Which  is  related  to  the  observation  equation  plays  a 
central  role.  [10]  -  [17]  can  be  viewed  in  this  category. 

1)  Linearization  method 

The  nonlinear  system  and  observation  equations 

x(t)  =  f (x(t) ,u(t) , t) ,  (2-14) 

y(t)  =  h(x(t) , t) ,  (2-15) 

are  linearized  around  some  reference  point,  for  example,  the  origin  or 
the  equilibrium  point  or  a  proper  operation  point  to  study  the 
neighborhood  property  around  them.  Here,  a  linearized  version  of  (2- 
14),  (2-15)  is  obtained  as 


6x(t)  =F6x(t)  +  Giu(t), 


(2-16) 


Lf  1  { . ( Lflc(h) ) . ),  k=l,  2,  . ..,  m 

The  Lie  differential  dg(x)  is,  then  a  finite  linear-  combination 

dg(x)  =  {d(Lfl(....(Lfk(h) )...))}, 

=  (Lfl(....(Lfk(dh) )...)}.  (2-21) 

The  observability  rank  condition  is  satisfied  if  dg(x)  in  (2-21)  has 
rank  n. 


3)  Taylor  Series  expansion  (9] 


The  Taylor  series  expansion  of  (2-15)  about  an  initial  condition 


x(t  )  =  x  at  t  is 
'  o  o  o 


At 


y(t)  =  y(to)+y'(tQ)  At  +  y"(to) + 


=  E  y(1) (t  > 
i=o  n 


A  t 


(2-22) 


Define  the  collection  of  all  the  coefficients  of  (2-22)  to  be  Y  such 
that 

y  =  ty{i)(to),  i  =  i,2,...  »  ]T, 

=H(xq).  (2-23) 

Then  one-to-one  relation  of  the  function  (2-23)  is  checked.  In  actual 
application  y^(t  ),  i  =1,2,...  is  checked  if  it  is  an  even 


function  in  x  . 


16 


4)  Jacobian  matrix  approach 

Observation  equation  y(t)  is  differentiated  with  appropriate 
substitution  according  to  the  system  equation  (2-14)  successively. 
Then  the  Jacobian  matrix  J(.)  evaluated  at  xq  is  analyzed  as  follows; 

i)  Rank  test  of  determinant  J  (.}  [10],  [11] 

or,  equivalently  nonzero  of  det  J  is  tested  [17]. 

ii)  Ratio  condition  [13],  [14],  [15] 

Ratio  condition  is  satisfied  if  the  absolute  value  of  the  leading 
principle  minor  of  J(.)  is  greater  than  e  >0,  i.e.. 


(2-24) 


where  J^  is  obtained  by  taking  only  the  first  i  rows  and  colonns  of  J. 
Singh  [14]  checked  the  ratio  condition  for  the  matrix,  AJ,  where,  A, 
is  an  arbitrary,  n  x  mk  matrix  for  the  k-th  derivation  of  y(t)  such 
that  mk  >  n. 

iii)  Positive  semidefinite  of  AJ  [13],  [14],  [16], 

Again  A  is  an  arbitrary  n  x  mk  matrix  to  make  AJ  to  be  n  x  n 


matrix.  Then  the  system  is  said  to  be  observable  if  one  can  find 

matrix  A  such  that  AJ  is  positive  semidefinite. 

iv)  Minor  matrix  analysis  of  J  [12]. 

Minor  matrix  of  J  matrix  J ,  ...,  is  constructed.  Then 

for  each  J..,  an  unobservable  set  is  obtained  as 

D.  =  {  x  |  J.  *  0,  Ji+1  =  0  },  i  =  1,  2 . n-1.  (2-25) 

In  spite  of  many  results,  it  is  found  that  some  are 
insufficient  [9]  -  [11],  [13],  [14],  or  too  complicated  to  apply  in 

practice  [12],  or  applicable  for  only  special  class  of  nonlinear 
system  such  as  in  [13]  or  for  linearized  systems. 

Introduced  in  the  subsequent  section  is  a  new  method  which  is 
simple  to  apply  in  practical  problems  and  provides  not  only  the  test 
of  observability  of  the  system,  but  also,  identifies  the  unobservable 
states  when  the  system  is  unobservable.  This  approach  is  based  on 
Palais'  global  implicit -function  theorem  [1]  and  its  late  versions 
[19],  [20]. 

Modification  of  both  the  non-zero  Jacobian  condition  and  finite 
covering  condition  are  required  to  be  applied  to  the  system 
observability.  A  modified  version  of  the  global  implicit-function 
theorem  is  used  in  section  three  to  demonstrate  its  simplicity  and 
effectiveness  by  providing  various  examples  including  tracking  of  a 
maneuvering  target  where  only  bearing  information  is  extracted  from 
the  measurement  and  array  SONAR  tracking  problem  with  a  small  number 


of  sensors. 


2.2  A  modified  form  of  global  implicit-function  theorem 


The  most  common  inverse-function  theorem  guarantees  only  the 
existence  of  a  local  inverse  in  terms  of  the  nonzero  determinant  of 
the  Jacobian  of  the  function  f ( . ) .  The  implicit-function  theorem  is 
an  extension  of  this  theorem  to  include  argumented  variables  in  it. 
The  global  versions  of  these  theorems  are  the  global  inverse  -  function 
theorem  and  the  global  implicit-function  theorem,  respectively.  Both 
theorems,  in  a  global  sense,  require  nonzero  det  J(.)  and  finite- 
covering  conditions.  It  is  shown  here  that  both  conditions  can  be 
modified  further  to  be  sufficient  conditions  for  f  to  be  invertible 
uniquely.  I.e.,  without  losing  a  global  homeomorphic  property  of  f, 
one  can  relax  the  nonzero  Jacobian  condition  from  the  n  dimensions  to 
the  n-1  dimensions  for  the  special  structure  of  f .  However, the  finite- 
covering  condition  needs  to  be  added  to  the  one-covering  condition. 
The  modified  version  of  the  global  implicit-function  theorem  then  will 
be  used  to  determine  the  observability  of  the  given  nonlinear  system. 
See  Appendix  A  for  the  inverse  and  implicit  function  theorems  and  some 
related  definitions. 

Global  versions  of  the  local  inverse  and  implicit  function 
theorems  are  studied  by  several  authors  [25],  [26],  [27].  Here  these 
theorems  are  restated  without  proof  which  can  be  found  from  cited 


references . 


Theorem  2-1  Global  inverse  function  theorem 


Let  f  be  an  n  real  function  of  n  real  variables.  The  necessary 
and  sufficient  conditions  that  the  function  f:Rn  ->  Rn  defined  by 


f(x)=y, 


KeRn,yeRn 


be  a  C1  diffeomorphism  of  R11  onto  itself  are 


i)  each  f.  (x)  is  of  class  C  , 


ii)  det  Jf  (x)#>, 


iii)  lim  |  |  f  |  |  =  »  ,  as  |  |  x  |  |  -> 


Theorem  2-2  Global  implicit  function  theorem 


Let  f  be  a  n  real  function  of  n  +  r  real  variables  (n>l,  r>l) 
Consider  the  function  f:Rnx  Rr  ->  Rn  such  that 


f (x,v)=y. 


where  XeRn,vcRr,  yeRn  and  f  is  C1  in  x  and  v.  Then  there  exists  a 
unique  C1  function  such  that 


g^xR1,  ->  Rn, 


i)  det  Jf(.)  f  0  for  all  x  and  v,  where  J  =1  f/3  x. 


ii)  lim  | |  f (x,v) 


,  as  I  I  x  I  I  -> 
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Condition  iii)  in  the  Theoran  2-1  or  condition  ii)  in  the 
Theorem  2-2  is  called  a  "finite-covering"  condition  (see  below) . 

Next  it  is  shown  that  both  the  nonzero- Jacobian  and  the  finite  - 
covering  conditions  of  both  theorems  are  not  enough  for  f  to  be  one- 
to-one  correspondence.  Appropriate  modification  is  required  to 
provide  sufficient  conditions.  Before  a  discussion  is  presented  the 
following  terms  are  defined. 

Definitions  (26),  [31] 

A  cover  for  a  set  A  is  a  collection  v  of  sets  such  that  AC  V 
-  Vev 

Let  X  and  Y  each  be  connected  spaces.  If  f  maps  X  onto  Y  with  the 
property  that  for  each  y  £  Y  has  an  open  neighborhood  V  such  that  each 
component  of  ueU,  U  =  f  1(V)  is  mapped  homeomorphical ly  onto  V  by  f, 
then  f  is  called  a  covering  map.  In  this  case  if  the  cardinal  number 
is  n,  then  f  is  an  n-covering  map.  If  n  is  finite,  then  it  is  a 
finite-covering  map .  and  if  n=l,  then  it  is  a  one-covering  map  . 

Note  that  the  finite  covering  condition  excludes  the  possibility 
that  f  oscillates  infinitely  as  j  |  x  |  |  ->  <*>  .  With  the  above 
definitions,  next  two  lemmas  show  that  the  homeomorphism  of  f  (at 
least  in  a  local  sense)  provides  sufficiency  for  f  to  be  a  finite- 
covering  function.  But,  the  converse  is  not  true  (See  Example  2-1) . 


I 


Lemma_2-1 _[ 27] 


Let  f:X  |  ->Y,  X£ R11,  YcRn  be  a  local  homeomorphism.  A  necessary 
and  sufficient  condition  that  f  be  a  finite  covering  is  that 


lim  | |  f  (x)  ||=  »  . 

|  |  X  |  |  ->  oo 


Lemma  2-2  [26] 


Let  f :X|  ->Y,  X£Rn,  YcRn.  , If  f  is  a  homeomorphic  function  of  Rn 
onto  itself,  then 


lim  | |  f  (X)  |  |  =  »  . 
|  |  X  |  |  ->  =0 


Example  2-1 

Consider  the  two-dimensional  function  f  which  is  given  by 


f(x)  = 


2X  2 

VX2 


2X1+4X2J  • 


lim| | f (x) | |  =  lim  {x2+x2)2  +  (2x2+4x2)2}  =  » . 

|  |X|  |  — >  oo  X^+X^— >  oo 

Learly  the  finite-covering  condition  is  satisfied,  but  actual 
jlution  of  the  two  equations  yields 

2  2 

fj(x)  =  x1  +  x2  =  y1  , 
f_(x)  =  2x?  +  4x?  =  v_  , 


with  non-unique  solutions 


xi  =i  \j  4yy2 


Thus  f  is  only  locally  homeomorphic,  i.e.,  f  is  not  one-to-one 
globally.  Both  x^  and  x2  are  covered  by  the  two  "sheets"  of  cover. 
However,  the  existence  of  the  two  independent  solutions  is  guaranteed 
by  a  nonzero  determinant  of  the  Jacobian, 

det  Jf(x)  =  x:  x2  f  0, 

i.e. ,  with  x1  f  0  and  x2  f  0 » 

From  the  above  two  lemmas  and  example,  it  is  clear  that  the 
finite-covering  condition  only  provides  a  "weak"  sufficient  condition 
for  f  to  be  a  homeomorphic  function,  globally. 

Even  though  the  global  functions  have  played  a  fundamental  role 
in  many  research  works  in  nonlinear  system  studies,  both  the  nonzero 
Jacobian  and  the  finite  covering  conditions  are  not  enough  to 
provide  sufficient  conditions  for  f  to  be  one-to-one 
correspondence.  To  discuss  this  more  specifically  next  further 


definitions  are  made. 


The  function  has  a  global  inverse  on  R  as 


X1  = 


1/3 


X2  =  (y2> 


1/3 


,  1/3  ,  .1/3 

X3  =  V(Y1}  "(y2) 


Hence  f  is  a  homeomorphic  -  onto  function  unless 
det  Jf(x)  =  9x^2  =  0  , 
by  x^  =0  and  x2  =  0. 

Det  Jf(x)  =  0  is  allowed  either  by  x1  =  0  or  =  0  without 

loosing  functional  independence.  Note  that  both  x1  and  x2  are 
absolutely  independent  variables. 

Thus  the  nonzero- Jacobian  condition  can  be  weakened  to  (n-1) 
dimensions  instead  of  n  dimensions  in  the  special  form  of  f. 
Meanwhile  a  finite-covering  condition  must  be  modified  to  a  one- 
covering  condition  instead  of  finite-covering  condition.  But  neither 
one  is  not  enough  for  f  to  be  a  globally  homeomorphic  function  since 
a  nonzero- Jacobian  condition  alone  lacks  globallity  of  the  inverse  and 
the  one-covering  condition  alone  lacks  independency  of  f. 
Consequently, we  have  the  following  adaptation  of  the  previous 


theorem. 


Theorem  2-3 


n  n  2 

Let  f  :x  ->y,  xe  R  ,  ye  n  be  an  onto  C  function,  f  is  globally 
homeomorphic  x  onto  y  if 

i)  detJf  (x)t*  0  for  all  x 

(detJf  (x),*  0  if  f  contains  absolutely  independent  functions) 

ii)  f(x)  is  a  one-covering  function  for  all  x. 

Proof 


We  need 

to 

prove  that 

the  two 

conditions  mean  a 

global 

homeomorphism 

of 

f.  First, 

consider 

for  the  case  when  f 

has  no 

absolutely  independent  functions.  Then  by  the  inverse  function 
theorem  f  is  a  local  homeomorphism  from  x  to  y.  So,  by  addition  of 
restriction  U  on  f,  f^fx)  is  one-to-one  from  onto  y.  Next  if  f 
has  sane  absolutely  independent  function,  then  detJf  (x)/  0  provides  a 
local  homeomorphism  from  x  to  y.  The  function  f ^  which  is  excluded 
from  f  is  already  independent  from  f  ;  thus  f  ^  is  at  least  locally 
homeomorphism  from  condition  ii).  So,  f  is  locally  homeomorphic  and 
again  restriction  U  exists  such  that  f  be  one-to-one  from  U  to  y. 
Hence  if  we  can  show  that  U=x,  then  proof  will  be  completed.  Suppose 
U  is  a  proper  subset  of  x.  Since  U  is  open  in  x,  U  is  an  open  proper 
subset  of  x.  Let  x  fee  a  boundary  point  of  U,and  V  be  an  open  connected 
neighborhood  of  f  (  x  )  .  Since  f  is  a  one-covering  map  on  x,  f  1  (V)  is 
not  empty  and  consists  of  one  component.  Let  Nx  denote  this 
component.  Surely  Nx  contains  x  .  Let  =  Uflf  1(V).  Since  f  is 
continuous  f  *  is  open.  Hence  both  Nx  and  are  open  and  connected. 


Also  note  that  f  maps  both  Nx  and  N*  onto  V.  Since  is  open  and 
contains  x,  the  set  NxfiU  is  also  not  empty.  It  follows  that  Nxf1  Nx 
is  not  empty,  otherwise  there  will  be  at  least  one  point  x1  in 
Nx  f)  U  a  point  x2in  such  that  f(x^)  =  f(x2)  £  V,  and  will 
not  be  one-to-one  on  U  which  constitutes  contradiction.  Hence,  Nx  = 
Nj£,  i.e.,  x  is  in  N*  and,  therefore,  is  in  U  .  This  implies  U  can't 
be  an  open  proper  subset  of  x.  That  is  U  is  closed  in  x.  So,  U  is 
both  open  and  closed  in  x  and  nonempty.  Therefore  U  =  x.  ** 


Remarlcs 

1 .  Globally  homeomorphic  from  x  to  y  is  indentical  to  global  one-to- 
one  correspondence  and  continuity  [30]. 

2.  Every  homeomorphic  onto  function  is  a  covering  map,  and  every 
covering  map  is  locally  homeomorphic. 

3.  Even  a  nonzero- Jacobian  condition  can  be  relaxed  to  n-1 

dimensions.  Here  n  dimensions  will  be  assumed  in  the  general 
discussion  since  detJf^  0  always  includes  detJf  f  0. 

Lemma  2-3 

If  every  entry  of  the  Jacobian  J  of  f  does  not  make  any  sign 
diange  along  the  real  line  of  x,  then  f  is  globally  a  one-covering 
map. 


Proof  ,  , 

3fi 

Entry  J .  .  =  -  ,  i,j  =  1,  2 .  n  is  variation  cf 

1J  5xj 

function  f.  with  respect  to  j-th  direction  of  x. 


If  f.  does  not  make 
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any  sign  change  due  to  Xy  then  f^  is  monotone  in  j-th  direction, 
i.e.,  is  one-covering  function  with  respect  to  x^.  If  every 

function  does  not  have  any  sign  change  in  any  direction,  then  f  is 
a  one-covering  function  globally.  ** 

In  order  to  be  a  multiple-covering  function  in  any  direction,  the 
slope  of  a  corresponding  entry  must  be  changed  due  to  that  direction. 
Then  the  number  of  possible  covers  are  one  plus  the  number  of  sign 
changes.  The  nonzero- Jacobian  condition  may  be  combined  to  constitute 
one  method  to  determine  one-to-one  correspondence  of  f.  See  Theorem 
2-4  below. 

Lenina  2-4 

If  the  Jacobian  J  of  f  (x)  is  either  positive  or  negative 
definite  for  all  x,  then  f(x)  is  a  global  one-covering  map. 

Proof 

Proof  for  the  part  of  the  positive  definite  case  is  given  in 
[19].  Negative  definite  case  can  be  proven  similarily.  ** 

In  Lema  2-4,  the  nonzero- Jacobian  condition  is  already  implied 
hence  not  required  here.  A  modified  version  of  the  global  inverse 
function  theorem  allows  us  to  adept  the  global  implicit  function 
theorem  as  follows; 

Theorem  2-4 

Consider  f:x  x  u  ->y,  x£Rn,u£Rr,  y£Rn  such  that 


f(x,u)  =  y. 


Suppose  f  is  C1  function  in  x  and  u.  If  f  satisfies  the 
following  two  conditions, 

i)  det  Jf(.}^  0  for  all  x. 

ii)  f(x,u)  is  a  one-covering  map  on  all  x,  then  there  exists  a  unique 
continuous  function  g  such  that 

x  =  g(y,u).  (2-27) 


Proof 

Define  a  vector  x  and  vector-valued  function  f  as 


/V 

X  = 

r x ' 

/V 

/* 

f  = 

fl] 

ss 

f  (X,u) 

/\ 

.  f  2  . 

.  u 

/V 

r  y 

y  < 

iu 

which  maps  Rn+r  onto  itself.  Obsviously  ?  is  continuously 
differentiable  with  respect  to  x  and  its  Jacobian  matrix  is 


where  I  is  an  identity  matrix  with  dimension  r.  Since —  f  0, 

r  3 


3  f 

det  (--)  f  0  from  (2-29).  And  since  f,  =  f(x,u)  is  a  one-covering 
•\  ~  l 

3  X 

A  A  A 

map  on  x,  and  f2  =  u  is  also  a  one-covering  map  on  u,  f(x)  is  a  one- 

/v 

covering  map  on  x  =  fxl.  Therefore  by  the  Theorem  2-3,  there  exists  a 

,UJ 

''-l 

globally  continuous  function  g  =  f  such  that 


g(Y)  =  X,  (2-30) 


g(y,u) 
g, (y.u) 


(2-31) 


for  all  y  eR11  r. 

Take  the  first  n  equations  from  (2-31). 

x=g(y,u),  (2-32) 

n+r 

which  is  also  a  globally  continuous  function  mapping  from  R  into 
R11.  ** 

As  shown  a  nonzero- Jacobian  determinant  guarantees  the  existence 
of  a  local  hor.eomorphic  inverse,  i.e.,  provides  the  "connectedness"  of 
every  component  of  x  to  Y,the  measurement  space.  But  the  connection 
may  not  be  necessarily  unique.  For  this  reason  nonzero- Jacobian 
condition  will  be  called  "connectedness  condition”  in  the 

observability  problem  which  will  be  discussed  in  the  next  section. 

A  one-covering  condition,  on  the  other  hand,  provides  the 
uniqueness  of  the  connection  globally.  So,  the  one-covering  contiion 
will  be  called  the  "univalence  condition"  in  the  observability 

problem.  Heuristiclly,  Theorem  2-4  says  that  the  mapping  (2-26)  is  a 
one-to-one  correspondence  globally  if  every  x^,  i  =  1,  2,  ...,  n  can 

be  expressed  uniquely  in  terms  of  only  Y  and  u  for  all  x. 

With  this  background  about  the  nonlinear  functions,  observability 
of  nonlinear  systems  is  studied  next. 

2-3.  Observability  of  Nonlinear  systems 

State  and  observation  equations  are  given,  again,  as 


x(t)  =  f (x(t) » u( t ) , t )  , 


(2-33) 
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y ( t )  =  h(x(t) , t) .  (2-34) 

As  assumed  earlier  f(.)  satisfies  necessary  conditions  to 
guarantee  the  existence  and  uniqueness  of  the  solution  x(t) .  Further 
it  is  assumed  that  h( . )  is  differentiable  up  to  (n-l)-th  order  with 
respect  to  t.  Then,  define  system  observability  as  follows. 

Definition 

System  (2-33),  (2-34)  is  observable  at  t  if  knowledge  of  the 

input  u(t)  and  the  output  y(t),  t  e [tg , t^]  is  sufficient  to  determine 
x(tQ)  uniquely  for  finite  t^.  If  every  state  x(t)  ERn  is  observable 
on  the  time  interval  [t  ,t^],  then  the  system  is 

completely  observable. 

Note  here  that  due  to  the  assumption  of  the  existence  and 
uniqueness  of  the  solution  in  (2-33),  x(t)  can  be  uniquely  determined 
from  proper  construction  of  the  integral  operator  g( . )  as  in  (2-3) 

x(t)  =g(x(tQ),  u(t),t),  (2-3) 

once  x(t  )  is  known, 
o 

So,  the  definition  of  the  x(tQ) -observability  above  implies, 
also,  x( t) -observability  for  the  considered  time  interval  te  [ t^ , t^ ] . 

Next,  to  derive  more  definitions  on  the  system,  differentiates 
(2-34)  with  respect  to  t  and  makes  appropriate  substitution  (2-33) 
(with  suppression  t  in  the  variables) 


y'  =  3h  +  3h  3x 


=  h. 


h  f 
x 


3t 


3x  3 1 


=  h^x.U/t)  . 

3hi  3hi  3  x  3hi  3  u 

'= - -+  — - +  — - - 

3t  3x3t  3u  3t 
=  h(x,u,u',t)  . 


=h. 


It 


h,  f  + 
lx 


hluu' 


Vln'1,-h,n-2)tth(n-2)xf  +  h(n-2)uu'+ . +h(n-2)u<n-3)u 

=  hn_i(x/u/u' , . . . ,u^n  ,t)  , 


(n-2) 


where  denotes  i-th  time  derivatives  of  y ( t ) . 

Define  an  mn-dimensional  vector  Y,  measurement  vector  of  the 
(2-33),  (2-34)  as  the  left  hand  side  of  (2-35),  i.e., 


y  =  [y,y',y" . y(n  2)]T  . 


and  an  mn-dimensional  function  H( . ) ,  measurement  function  of 
as 

H( . )  —  [h,hrh2 . Vl]T  • 

Then  one  obtains  an  mn-functional  relation  in  vector  form 

Y  =  H(x,v,t)  , 

where  v(t)  is  a  function  u^,  i=l,2, . . . ,n-2. 


(2-35) 

system 

(2-36) 

(2-35) 

(2-37) 

(2-38) 


From  equation  (2-38)  next  can  be  proved. 


Theorem  2-5 

If  every  state  x(tQ)  is  uniquely  determined  from  (2-38),  then  the 
system  (2-33),  (2-34)  is  observable  at  t  . 

Proof 

The  proof  will  be  completed  if  one  can  show  that  the  unique 
determination  of  every  state  x(tQ)  from  (2-38)  is  equivalent  to  that 
every  state  is  uniquely  determined  from  the  measurement  y(t),  te 

tw- 

Let  us  expand  the  function  y(t)  in  a  Taylor  series  for  anyte 

y(t)  =  y(tQ)+y'(to)(t-to)+0.5y"(to)(t-to)2+ . + 

_L_  y(n-1)  (t  )  ( t— t  )n— 1  +  r(t)  .  (2-39) 

(n-1) !  °  ° 

Since  the  Taylor-series  expansion  of  an  arbitrary  function  is  unique, 

each  coefficient  y^(t  ),  i  =  1,  2,  ...,  n-1  is  also  unique.  So, 

once  y(t)  is  determined,  then  y ^ 1 ^ ( tQ)  is  determined  uniquely. 

However,  each  coefficient  of  (2-39)  is  an  exact  element  of  the 

measurement  vector  Y  in  (2-38).  Therefore,  if  x(t  )  is  uniquely 

solveable  in  terms  of  Y,  v  and  t  in  (2-38),  then  the  system  is 

observable  at  t  by  the  definition.  ** 

o 

Thus,  the  observability  problem  of  the  system  is  equivalent  to 
find  the  condition  under  which  (2-38)  has  a  unique  inverse  about  state 


x(t).  Or  geometrically,  the  system  is  observable  if  the  mapping  (2- 
38)  is  one-to-one  from  the  state  space  x  eRn  into  or  onto  the 
measurement  space  Y e R11111  for  all  te  [t0,tj].  (See  Figure  l.) 


state  space  measurement  space 


Figure  1,  Geometric  interpretation  of  system 
observability 

So,  from  the  functional  analysis  results  of  the  previous  section 

and  Theorem  2-5,  the-  system  is  observable  if  the  following  two 
conditions  are  satisfied. 

1 .  Connectedness 

Every  state  x^,  i  =  1,  2,  ...,  n  must  be  connected  to  any 

elements  of  measurement  space  Y,  i.e.,  (2-38)  constitutes  n 


V”.  wJTlt7*  R^X-T  x.w  It w  Hw  fc_w  ’f.  »-fc'w^ 
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independent  function  with  respect  to  x  in  time  interval  te  [t  ,t^]. 

2 .  Univalence 

Further ,  every  state  x^ ,  i  =  1,  2,  ....  n  must  be  connected 

uniquely  to  the  measurement  space  Y. 

As  mentioned  earlier,  the  first  condition  is  related  to  the 
functional  independency  and  thus  nonzero  Jacobian  condition  of  (2-38) 
and  the  second  condition  is  related  to  the  one-covering  condition. 
Before  applying  Theorem  2-4  it  is  required  to  rearrange  (2-38)  to 
reduce  computational  complexity  as  follows.  This  procedure  helps  to 
maximize  the  functional  independence  before  applying  the  non-zero 
Jacobian  condition  by  deleting  functionally  dependent  elements  from 

the  mn  functional  H. 

y  =  h(x,t) ,  (2-40) 

y’  =  h^x.u.t).  (2-41) 

By  appropriate  replacment  of  h1 ( . )  by  h( . )  one  can  obtain 

y’  =  hla  (y,x,u,t).  (2-42) 

Repeating  this  procedure  up  to  (n-l)th  order  gives 

y"  =  h2a{y'y' <x'u'u' •*) > 


y(n  =  hn  i  jY«Y'<---'y(n  2)-x-U,u( . u(n  2),t)  .  (2-43) 

n-1  f  a 


Denote  Y  the  set  consisting  of 
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y~  =  {y,y',y" . ,y(n'2)}  .  (2-44) 

and 

Y  =  (u,u',u" . ,u(n_2))  .  (2-45) 

Then  the  vector  notation  of  (2-42),  (2-43)  becomes 

Y  =  H  (Y~,x,V.t)  .  (2-46) 

cl 

Successive  replacement  of  lower  order  derivatives  to  the  higher  order 
derivatives  as  in (2 -43)  means  minimizing  functional  dependence  between 
the  individual  functional  elements  h,  h. , . . . ,  since  the  procedure 

is  exactly  the  same  as  the  successive  elimination  of  unknown  variables 
in  solving  (2-38)  for  x.  Thus  maximum  independence  between  functional 
elements  is  obtained.  Next  let 

p  =  (Y",r,t), 

then  (2-46)  becomes 

Y  =  Ha(x,p) .  (2-47) 

with  (2-47)  and  Theorem  2-4  determination  of  the  system  observability 
can  be  made  using  the  following  result. 

Main  Result 

System  (2-33),  2-34)  is  observable  (in  the  strict  sense)  if  (2- 


47)  satisfies  the  following  two  conditions  for  all  te[t  , t  ]. 


i)  Connectedness  condition 


det  JH”(.)  *  0  ,  (2-48) 

where  JH^=— h”,  and  is  any  subset  of  ^  consisting  of  n  functions, 
ii )  Univalence  condition 

For  the  chosen  ( . )  ,  every  state  xA ,  i  =  1 ,  2 ,  . . . ,  n  can  be 

uniquely  expressed  in  terms  of  only  Y  and  p. 

The  assertation  is  obvious  from  the  Theorem  2-4  and  2-5.  Actual 
proof  is  similar  to  the  proof  of  the  Theorem  2-4. 

Depending  on  the  satisfaction  of  the  conditions  i)  and/or  ii)  of 
the  result,  define  and  categorize  system  observability  as  follows: 

1.  Observable  in  the  strict  sense. 

Both  of  the  two  conditions  are  satisfied  for  at  least  one 
combination  of  h”  out  of  mn  function  1^. 

2.  Observable  in  the  wide  sense. 

Only  the  connectedness  condition  is  satisfied  for  any  one  or  more 
states,  i.e,  multiple  covering  appears  in  any  component  of  x  for  any 
time  t. 

3 .  Unobservable 

One  ore  more  components  of  x  cannot  be  expressed  in  terms  of  Y 
and  P.  In  this  case  these  states  are  unconnected  to  Y  and  thus  the 
system  is  unobservable. 


V  v  v’v 


The  above  observability  determination  is  demonstrated  by  the 
following  examples. 

Example  2-3 

A  falling  body  in  the  constant  gravity  field  with  position 
variable  x1  and  velocity  x2  can  be  expressed  as 

*1  =x2 ' 

x2  =  -g,  g  is  constant. 

If  one  measures  position  x^ ,  then 

y  =  x1#  and 

y  ■  *1  *  v 

T 

So,  both  states  are  uniquely  determined  from  Y  =  (y,  y1 )  ,  and  hence 
the  system  is  observable.  On  the  other  hand  if  velocity  x2  is 
measured,  then 

Y  =  x2, 
y'  -  x2  =  -g. 

Only  x2  is  connected  uniquely  to  Y.  Xj  is  disconnected  and 
unobservable;  hence  the  system  is  unobervable.  Classic  rank  test  can 


be  used  to  verify  this. 


Example  2-4 

X1  =  X1  +  u  , 

x2  =  2x1~x2+3x3+2u  , 


y  =  2x2+x3, 

then 


y'=  4x1-2x2+7x3+2u, 
y"=  2x2+x3  =  y. 

Only  x2,  x3  can  be  obtained  uniquely  if  x.^  is  given,  i.e.,  x, 
unobservable.  Decoupling  procedures  show  that  x^  is  unobservable 

Example  2-5 

A  gyrocompass  precessional  motion  is  descrived  as  [17] 

x1=ax2+bx3  ,  a>0,  b=a(l-p  ),  0< p <1  , 

•  .  3 
x2=-cx1-dx1 , 

• 

x3=-Fx2~Fx3  ,  F>0  , 

y=x1 ,  then  ( 

y'=ax2+bx3,  ( 

y"=  -acy  -  ady3  -  bF(x2+x3).  ( 


det  J  =  bF(b-a)  f  0. 


From  (2-49)  ~  (2-51) 


^1=Y, 

x  =  - ( acy+ady3+Fy ' +y" ) 

F(b-a) 

3 

x  =  bFy '  +a  ( acy+adv  -t-y") 
bF(b-a) 

Clearly,  all  the  states  are  observable  from  the  last  three  equations. 
So,  the  system  Is  observable. 


Example  2-6  f 31 , (13] 


X1=X2X3' 

X2=~X1X3' 

x3=°. 


y=Xj ,  then 

y ' =x2x3 , 

,.2  2 

V-~xix3  =  -*V 


(2-53) 

(2-54) 

(2-55) 


So,  det  J  =  2x  x  ?  0  implies  that  the  initial  state  of  the  form 

1  v 

(Xio*  0 ' X3o^  satisfies  the  connectedness  condition.  But  from  (2- 


f  '  J 


53)  to  (2-55), 


x  and  x-  have  multiple  expressions  or  two  covers.  So,  the  uni valence 
condition  is  not  satisfied.  The  system  is  only  observable  in  the  wide 
sense  if  (x1Q  f  0,  x3Q  f  0  } . 

Example  2-7  [12] 


w 


X2=-2X1-3X2_X1X3 ' 


X3=*X3X4' 


y^i 


So, 


Y'=X2' 

3  3 

y"=-2x1-3x2-x1x3  =  -2y-3y'-y  x3, 

y> '  '=-2y'-3y"-3y2y'x3+y3x3x4 


(2-56) 


(2-57) 

(2-58) 

(2-59) 


6  & 

det  J  =  -y  x3  =  "XjXg  f  0  implies  the  connectedness  is  satisfied 
when  {x1Q  f  0,  x3Q  f  0  }.  Here,  note  that  (2-56),  (2-57)  are 
absolutely  independent  functions.  So,  det  J=0  is  allowed  as  far  as 
det  J  ^  0,  where  J  is  the  Jacobian  after  deleting  any  one  of  the 
two  absolutely  independent  functions.  In  this  case  only 

X20  =  =  °‘ 


is  allcwed  since  x.=  0  makes  det  J 


=  0.  . 


Fran  (2-56)  ~  (2-59) 


x2=Y', 

x3=-(2y+3y'+y") 


x4=-(2y'+3y"+y' ' ' )  +  3y' 

2y+3y'+y'  y 

Obviously,  the  univalence  condition  is  satisfied.  So,  the  system  is 
observable  if  {x1Q  +  0,  x3Q  f  0}  is  preserved. 

Two  practically  more  important  examples  are  shown  in  the  next 
section  which  will  be  used  also  for  stochastic  system  observability. 

2-4  BOT  and  array  SONAR  tracking  examples 

System  observability  determination  of  two  important  examples  in 
underwater  tracking  are  demonstrated  here.  The  first  example  is  a 
bearing-only- target  tracking  problem  where  only  bearing  information  of 
the  target  is  extracted  from  the  measurement  device  and  used  to 
determine  the  observability  of  the  other  state  variables  as  well  as 
whole  system  observability. 

Consider  an  object  or  target  (T)  and  observer  or  ownship  (0) 

configuration  as  in  Figure  2.  When  T  and/or  0  move  with  velocity 

components  v_  ,  v_  ,  v_  ,  v_  ,  relative  coordinates  x(t)  and  y(t) 
IX  ly  ox  oy 


can  be  generated  as 


x(t)  =  x^tj-x^t) 
Y(t)  =  yT(t)-y0(t) 


(2-60) 

(2-61) 


Define  the  state  variables  in  mixed  coordinates  which  consists  of 
mixed  components  of  polar  and  rectangular  coordinate  as 


xx(t)=  6 (t) , 
x2(t)=  r(t) , 
x3(t)=  vTx(t)-vQx(t) 

X4(t)=  VTy(t)“V0y(t) 


vx(t), 

Vth 


(2-62) 

(2-63) 

(2-64) 

(2-65) 


where  3  ( t )  is  bearing  of  T  from  0  with  respect  to  some  reference. 
North  N  here,  and  r(t)  is  range.  Then  from  the  relations 


x(t)  =  r(t)sin  B(t) , 
y(t)  =  r ( t ) cos  3( t) , 


(2-66) 

(2-67) 


and  their  derivatives  with  proper  algebra,  the  state  equation  in  this 
coordinate  system  becomes 


x„cosx,  -  x.sinx. 
3  14  1 


x(t)  =  x3sinx1  +  x4cosx1 


(2-68) 


where  a^(t),  a^(t)  are  accelerations  in  their  directions.  Due  to 

bearing  measurement  the  observation  equation  is 


To  make  the  system  simpler,  it  is  assumed  that  a  (t)  =  0,  a  (t)  -  a(i 

x.  y 

^  0  in  (2-68),  i.e.,  maneuvering  exists  only  in  x-direction.  The 

successive  replacements  yield 


y  =  x 


1' 


(2-70 


y  = 


x3cosy-x4siny 


(2-71 


y  = 


- ( a . siny+2y ' cosy . x4+2y 1 siny . x3 ) 


(2-7? 


y'  "  = 


2  2 

3ay ' cosy+ [ 3y "siny+2 ( y )  cosy ] x3+ [ 3y" cosy-2 ( y ' )  siny]x4+a'siny 


(2-7: 


So,  from  (2-70) -(2-73) 


x:  =  y,  (  2-7' 

-2y ' x . -acosy . siny 

x  =  - i -  ,  (2-7. 

y"cosy+2(y' )2siny 

2 

[y"siny-2(y' )  cosy]x.-ay 'sany 

X  =  - - -  .  (2-7 

y"cosy+2(y' )2siny 


3  2  2 

a[4(y')  cosysiny+6y 'y"cos  y-3y'y"-y' "cosysiny]+a'siny[y"cosy+2y' )  siny] 


2y  y '-3(y  )2+4(y' )4 


, (2-77) 


From  (2-77)  it  is  clear  that  x.  is  connected  to  the  measurment  vector 

4 

Y  and  it  is  unique  when  a(t)  and/or  a'(t)  are  nonzero,  i.e., 
maneuvering  exists.  This  implies  from  (2-75)  and  (2-76)  that  x and 
x3  are  also  uniquely  connected  to  Y.  So,  the  system  satifies  the 
connectedness  condition  if  T  and/or  0  maneuver.  But  when  a(t)  =  0, 

a'(t)  =  0,  i.e.,  when  non-maneuvering,  (2-77)  says  that  x  is  not 
connected  to  Y  and  is  unobservable.  This  causes  again  from  (2-75)  and 
(2-76)  that  x2,  xg  are  disconnected  from  Y,  and  thus  these  states  are 
unobservable  from  Y.  Only  x^  is  observable  , in  this  case, which  is 
itself  a  measurement  variable.  After  lengthy  computation,  the 
determinant  of  the  Jacobian  becomes 


det  J  = 


2  2 
-2a'y'siny+3a[2(y' )  cosy4-y"siny]-[12y'y"siny(  1+cos  y)  + 


3  3  2 

8cos  y(y' )  ]x3+4y'cosysiny[2(y' )  cosy+3y"siny]x4 


(2-78) 


From  (2-78)  the  system  is  unobservable  with  det  J  =  0  for  the 

following  cases: 


1.  Infinite  range,  x2  =  00  , 

2.  Non-maneuvering,  x3  =  x4  =0  with  a(t)  =  a'(t) 
(Including  parallel  stationary  movement  and 
chasing. ) , 
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3.  Zero  heading  rate  and  acceleration,  0'(t)  =  0"(t)  =0, 

4.  Constant  range  with  special  heading  such  that 

6  a{  e')2 

tan  8  - -  .  (2-79) 

2a'  g'-3ag" 

As  well  as  certain  others,  the  system  is  unobservable  due  to  the  lack 
of  rank  when  any  one  or  more  conditions  of  above  are  satisfied. 
Consequently,  from  ( 2-74) -( 2-78) ,  it  is  shown  again  that  for  EOT 
tracking,  the  system  is  observable  only  when  maneeuvering  exists. 

The  second  applicational  example  is  the  underwater  SONAR  tracking 
problem  where  the  number  of  sensors,  deployment  and  measurement 
schemes  are  changed.  For  good  system  observability,  the  number  of 
sensors  and  their  configuration  are  very  important.  Further,  with  the 
same  number  of  sensors  and  the  same  deployment  structures,  measurement 
policy  is  even  more  important  for  many  cases.  One  can  measure 
absolute  wave-propagation  time-delay  between  the  target  and  sensor  or 
time  delay  difference  between  the  two  sensors,  Doppler  or  Doppler 
difference  or  any  combination  thereof.  Each  of  these  measurement 
policies  requires  different  observability  analysis.  Deployment  can  be 
considered  as  either  horizontal  (towed  linear  array)  or  vertical  to 
the  surface  (vertically  planted  array).  Figure  3  shows  sensor  and 
target  configuration  for  up  to  three  sensors  which  are  deployed 
vertically.  Only  directly  propagated  wave  is  considered  here.  In  the 
one-sensor  case,  only  absolute  time  delay  or  absolute  Doppler  shift 
between  T  and  S2  can  be  measured.  It  implies  that  synchronization  of 


Figure  3  >  Sensor  configuration 


1) .  One-sensor 
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T  and  is  required  for  the  passive  case  or  can  be  used  for  only 

active  SONAR  case . 

In  two-sensor  measurement,  either  absolute  quantities  or 
comparative  differencies  of  intersensor  delay  and/or  Doppler  can  be 
measured. 

Here  it  is  assumed  that  three  measurement  policies  occur  . 


1.  One  relative  delay;  2S1D 

2.  One  relative  Doppler;  2S1P 

3.  One  relative  delay  and  Doppler;  2S1D1P 


In  the  three-sensor  deployment,  several  possible  measurements  are 
considered  as  follows: 


1.  Two  relative  delay;  3S2D 

2.  Three  relative  delay;  3S3D 

3.  TVjo  relative  delay  and  one  Doppler;  3S2D1P 


-»  _  p  T  ' 


Of  course,  more  than  three  sensors  can  be  considered.  But  it  is 
known  that  [68]  for  optimal  range  and  bearing  estimation  in  sense  of  a 
minimum  uncertainty  ellipse,  the  best  array  configuration  of  M  sensors 
is  three  groups  of  M/3  sensors  each  with  equal  spacing  between  groups. 
In  this  case,  all  sensors  in  a  "pod"  are  assumed  to  be  in  the  same 
location,  i.e.,  there  is  no  delay  between  sensors  in  the  same  group. 
Equally  spaced  M  sensors  showed  much  inferior  performance  than  the 
three  clusters  of  M/3  sensors  except  M  ■+0°  .So, the  number  of  sensors 
considered  here  are  limited  up  to  three. 


In  a  two-dimensional  coordinate  system,  at  least  four  states  are 
required  to  describe  the  motion  of  the  point  target  -  two  for  position 
and  two  for  velocity  in  each  direction,  respectively.  Since  sound 
speed  varies  quite  significantly  with  depth,  salinity,  temperature  - 
specially  in  coastal  inlets  [64],  [69],  [70]  it  affects  the  time  delay 
and  the  Doppler  shift.  So,  it  is  considered  as  a  state  variable, 
also. 

I.e.,  define  the  state  variables  as  follows: 
is  target  position  in  x-direction, 
x2  is  target  velocity  in  x-direction, 
x3  is  target  position  in  y-direction, 
x4  is  target  velocity  in  y-direction, 
x5  is  C (accoustic  wave  speed  in  R^ , 
xg  is  C2  (accoustic  wave  speed  in  R2) • 

With  the  above  state  the  system  equations  can  be  written  as 
(under  the  assumption  of  constant  wave  speed  in  depth) 

'o  l  o  o  o  o' 

0  0  0  0  0  0 

x(t)  =  0  0  1  0  0  0  x(t).  (2-60) 

0  0  0  0  0  0 

0  0  0  0  0  0 

,  0  0  0  0  0  0  _ 

The  basic  measured  quantities  are  time  delay  difference  t„ 
between  sensors  i  and  j,  and  Doppler  frequency  shift  difference  f^j 
from  carrier  frequency  f  =  3500  Hz,  which  seems  widely  used  in 


practical  SONAR  systems.  So,  for  example,  if  two  delay  and  one 
Doppler  shift  is  measured  with  three  sensors  (3S2D1P),  the  observation 
equation  becomes 


Y(t) 


*12™ 

fi2(t) 

.T32(t) , 

/ 

R2(t) 

Rj(t) 

c2(t) 

cl(t) 

H.(-t)  R.  (t) 

f  {_2_  _  -i -  } 

C  C  (t)  C  (t) 


R2(t)  R3(t) 

C2(t)  C3  ' 

f  ,  2 ,  2.1/2  ,  2 ,  ,  .2.1/2 

(x1+x3)  -  (x1+(x3-z2)  ) 

--  x“ 

VW’W  ’  fc(XlX2  "  (X3'Z2)X4} 
x5(xj  +(x3  -  z2)2)1/2 


x6(x2  +  x2)1/2 


,  2 ,  2.1/2 
<VX3> 


=  h(x(t) ,  fc,  C3), 


where  surface  sound  speed  C 


3 


^s  assumed  to  be  a  known  value. 


(2-81) 
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The  other  cases  of  measurement  equations  have  a  similar  form 
except  measuring  different  quantities.  Therefore,  in  all  cases,  the 
system  equations  are  simple  linear  equations  if  nonlinear  drag,  etc., 
are  neglected.  But  the  observation  equations  are  nonlinear. 

To  oberve  deterministic  observability  for  this  system,  categorize 
the  measurement  scheme  into  three  groups  for  convenience  as 


1.  An  absolute  delay;  1S1D 

2.  Pure  relative  delay;  2S1D,  3S2D,  3S3D 


► .  -,J 


(2-84) 


*2  {X1+X3} 


X1X2  +  X3X4 


V*2 


y"  = 


2  2  2 

4  +  xt  (y,) 

X6R2  Y 


Let 


then, 


2 

(y') 


A  = 


B  =  2(y')2  -  yy"  , 


y"  '  = 


-y  (x22  +  xj) 


-  A' 


y(4)  . 


(x2  +  x*)B 
R2Y 


-  A' 


y(5)  = 


2  2 

<X2  *  X4 ^ 

R^y 

*2y2 


(yB'  -  3By' )  -  A'" 


(2-85) 


(2-86) 


(2-87) 


(2-88) 


(2-89) 


From  ( 2—84 ) — ( 2—89 ) ,  it  is  clear  even  before  solving  them  for  x  that  x5 
does  not  appear  in  any  equation,  explicitly.  So,  x&  is  not  connected 


to  the  measurement  vector  Y, 

Y  =  (y,  y' ,  . y(5)}  . 

Obviously  x_  is  unobservable,  and  makes  the  system  unobservable 
deterministically.  Actual  solution  of  these  equations  shows  that 
other  variables  have  multiple  solutions,  i.e.,  they  are  connected  to  Y 
multiply,  thus  they  are  observable  at  least  in  a  wide  sense. 

In  the  second  case  when  pure  relative  delay  is  measured  as  in 
2S1D,  for  example,  then 


<X1  +  X3)V2 


[x2  +  (x3  -Z2)2I1/2 


(2-90) 


xlx2  +  X3X4 


X1X2  +  (X3  “  Z2)X4 


(2-91) 


Continuation  up  to  (n-l)th  order  derivatives  shows  that  the  results 
are  almost  identical  to  the  first  case  except  x5  appears  in  the 
expressions.  It  implies  immediately  that  all  the  states  are 
observable  at  least  in  a  wide  sense.  When  adding  more  measurements  by 
addition  of  more  sensors  like  3S2D  or  3S3D,  the  system  becomes  more 


observable  due  to  increasing  the  possibility  of  uniqueness  of  the 
solution  in  terms  of  state  x. 


In  the  last  case  when  the  measurement  equations  include  Doppler 
shift  as  in  2S1P,  2S1D1P  or  3S2D1P  shows  very  interesting  results. 
For  example  when  observing  one  Doppler  shift  in  a  two-sensor 
deployment  ( 2S1P } 


Y  =  f 


12  ' 


=  f  (-*■  - 


fc  < 


x2x2  +  x3x4 
X6R2 


X1X2  +  (X3  ~  Z2)X4 
X5R1 


)  , 


=  fcYD 


(2-92) 


vdiere  ^  is  the  time  differentiation  of  the  delay  (2-91). 
Continuation  gives 


Y’ 

Y" 


=  fcyS  ' 


=  f  v'  '  ' 
cyD 


f  y{6) 
cyD 


(2-93) 


Doppler  measurement  is  just  scaling  up  of  one  step  higher  delay 
differentiation  with  scaling  factor  fc-  However,  as  discussed  earlier 


the  2S1D  system  itself  is  already  observable  (at  least  in  a  wide 
sense).  So,  this  system  is  also  observable  in  the  same  content.  The 
same  argument  can  be  applied  for  the  2S1D1P  or  3S2D1P  measurement 
cases,  also.  Thus  the  Doppler  measurement  system  is  observable 
deterministically  as  far  as  a  delay  measurement  system  is  observable. 
Of  course,  a  scaling  factor  influences  the  magnitude  of  the 
information  obtained  from  the  measurement.  The  effect  of  this  will  be 
discussed  in  Chapter  Four  where  information  structures  of  the  '.’arious 


CHAPTER  3:  INFORMATION-THEORETIC  OBSERVABILITY 
OF  STOCHASTIC  SYSTEMS 

3-1.  Introduction  to  information  theory 

Involvement  of  the  noises  in  the  stochastic  system  description 
.uakes  it  very  difficult  to  extend  the  deterministic  system 
observability  condition  to  apply  in  the  stochastic  system  case.  A 
"yes"  or  "no"  type  answer  to  the  observability  question  has  little 
meaning  in  this  case.  Attempts  an  this  problem  must  be  interpreted  in 
a  probabilistic  sense. 

Contrary  to  the  former  results  [ 34 ] — [ 39 ]  where  Fisher  information 
is  mainly  used  to  study  the  stochastic  observability,  here  Shannon 
information  is  utilized  instead.  Specifically,  mutual  information  is 
computed  and  used  as  a  criterion  to  determine  the  degree  of 
observability  of  any  states  or  whole  system. 

Information  theory  has  two  general  orientations:  one  developed 
by  Wiener  and  another  by  Shannon.  Although  both  Wiener  and  Shannon 
shared  a  cannon  probabilistic  basis,  there  is  some  distinction  between 
them.  The  significance  of  Wiener's  work  is  that,  if  a  signal  is 
corrupted  by  some  noises,  then  it  is  attempted  to  recover  the  signal 
from  the  corrupted  one.  It  is  for  this  purpose  that  Wiener  orignated 
optimum  filtering  theory.  However,  Shannon's  work  goes  to  the  next 
step.  He  shewed  that  the  signal  can  be  transferred  optimally  provided 
it  is  properly  formed.  That  is,  the  signal  to  be  transferred  can  be 
processed  before  and  after  sending  to  counter  the  disturbance  and  to 


be  recovered  properly  at  the  destination.  For  this  purpose.  Shannon 
developed  the  theories  of  information  measure,  channel  capacity, 
coding  processors,  and  so  on. 

To  define  the  information  measure,  consider  the  simple 
information  channel  Figure  4  and  assume  that  is  an  input  event  and 
y.  is  a  corresponding  output  event,  i  =  1,  2,  ...,  n,  j  =  1,  2,  . . . ,m. 

J 

Now  define  a  measure  of  the  amount  of  information  provided  by  the 

output  (or  measurement)  y^  about  the  input  x^.  It  is  not  difficult  to 

expect  that  the  transmission  of  x^  through  the  noisy  channel  causes  a 

change  in  the  probablility  of  x^  from  an  a  priori  p(x^)  to  an  a 

posteriori  p(x.|y.).  In  measuring  this  change,  take  the  logarithmic 
^  J 

ratio  of  the  two  probabilities.  It  turns  out  to  be  appropriate  for 
the  definition  of  information  measure  which  is  suggested  first  by 
Hartley  [40].  I.e. ,  the  amount  of  information  provided  by  y^  about  x^ 
can  be  defined  as  [40],  [41]. 


noise  source 


Figure  4,  Input-output  block  diagram  for 
information  channel 
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P(x1lY1) 

I(x. ,  y  .)  =  log, - —  ,  bits  , 

J  P(Ki) 

P(xily.) 

=  log.  n - —  ,  hartleys  , 

P(x.) 

P(xi|y,) 

=  In  - - — ±—,  nats  .  (3-1) 

P(Xj) 

(3-1)  is  defined  by  Shannon  and  used  as  a  measure  of  mutual 

information  between  event  x.  and  y. .  If  p(x.  |  y.. )  =1 

1  j  1  J 

I(xi.  Yj)  =  K^). 

=  In  (l/p(xj)  =  -lnp(x.}.  (3-2) 

-L  J. 

(3-2)  is  called  self  information.  If  (3-2)  is  true  for  all  i,  then 
the  channel  is  noiseless.  Averaged  amount  of  information  which  is 
represented  by  H(x) 

n 

H(x)  =  Z  p(x  )I(x  )  , 
i=l 

n 

-  -  1  p(x.  )lnp(x. ) ,  (3-3) 

i=l 

has  been,  traditionally,  called  "information  entropy,"  or  just 
"entropy"  of  x.  In  statistical  thermodynamics  H  is  a  measure  of 

"disorder"  or  "uncertainty."  Boltzmann  showed  [42]  that  in  an 
isolated  thermodynamic  system  H  could  never  decrease, i.e. ,the  system 


tends  to  its  maximum  disorder. 


To  decrease  the  entropy,  one  must  add 


information  to  it  either  by  transferring  entropy  out  of  the  system 


boundary  or  by  making  observation  (measurement) .  Here  we  are 
interested  in  the  latter  method . I. e. ,  to  decrease  the  ’uncertainty  of 
the  general  stochastic  system,  measurement  will  be  made  and  observe 
the  decreased  amount  of  uncertainty,  and  thus  will  use  this  quantity 
as  a  test  criterion  of  the  observability  of  the  system. For  an  n  random 
vector  x  with  continuous  probability  density  p(x)  with  natural 
logarithm  base,  H(x)  becomes 

1 

H(x)  =  -/  p(x)ln -  dx  , 

p(x) 

=  -  /^p(x)lnp(x)dx  , 

*  -E[lnp(x) ]  ,  (3-4) 

where  E  is  expectation  operator. 

Another  quantity  of  information  content  which  is  commonly  used  is 
the  Fisher  information.  For  the  same  x  and  density  p(x) ,  Fisher 
information  is  defined  as  [43]-[47]  and  [66]. 

32lnp(x) 

J(x)  =  -  /^3(x) - - -  dx, 

3  x3x 

31np(x)  3 lnp(x)  _ 

=  ( - )( - )  cx, 

3  x  3  x 

1  3  p(x)  3P(x)  _ 

=  /x  —  ( - )( - }  dx-  0-5; 

p(x)  3  x  3  x 


Algebraic  identity 

lnp(a)  1  3pr.(a} 

•  § 

a  p(a)  3  a 

was  used  in  the  last  equality  of  (3-5).  More  compactly  (3-5)  becomes 

3'2lnp(x) 

J(x)  =  -E[ - = — ]  , 

3x*  3  xT 

31np(x)  31np(x) 

=  E[( - )( - )x]  (3-6) 

3  x  3  x 

From  the  two  definitions  (3-4)  and  (3-5)  above,  it  is  clear  that  the 

Fisher  information  J  is  a  nxn  matrix  quantity  and  that  the  Shannon 

information  H  is  a  scalar  valued  quanitity.  The  general  relation 

between  these  two  information  concepts  will  be  discussed  briefly 

later .  However ,  immediate  comparison  of  ( 3-4 ) ,  ( 3-5 )  shows  that  a 

simple  relation  can  be  derived  if  a  specific  density  p(x)  is  given  for 

any  random  variable  x.  For  example,  a  scalar  random  variable  X  with 

2 

Gaussian  density  having  zero  mean  and  variance  a  has  a  Fisher 
information 

3  2lnp(x)  1 

J(X)  =  -E[ - j - ]  =  2  •  (3~7) 

3x  o 

Meanwhile  its  entropy  is 

H(X)  =  -E[lnp(X)]  =  l/21n(2Tra  2). 


(3-8) 
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So,  from  (3-7),  (3-8)  one  can  get  the  relation 
dH(x  ) 

- —  =  1/2J (x )  .  (3-9) 

d(o  ) 

Generalization  of  this  relation  can  be  found  in  [43]  and  [44]. 

Appendix  B  shows  that  the  maximum  entropy  density  funcitcn  varies 
depending  on  the  constraints  which  are  added  to  the  density  p(x). 
The  Gaussian  density  has  maximum  entropy  under  the  given  mean  and 
variance  condition  when  x  ranges  from  -  00  to  +  00  . 

It  is  known  that  [48,  and  from  private  communication  with  R.W. 
Hamming,  Naval  Postgraduate  School,  March  1985]  entropy  of  commonly 
used  random  variables  H(x)  and  its  variance  have  one-to-one  relation 

H(x)  =  1/2  in(Ac^) ,  (3-10) 

if  the  density  and  expectation  of  x  exist.  So  ,  for  example,  the 
inverse-Gaussian  or  Cauchy  density  does  not  have  the  relation  (3-10) 
due  to  nonexistence  of  mean  and  variance  expressions.  Constant  A  is 
determined  once  density  is  known.  A  ='2ire  for  Gaussian  case,  for 
example,  from  (3-8). 

Table  1  shows  this  relationship  for  some  commonly  used  densities 


[48]. 


contaminated  received  signal  [41],  [49],  [50].  The  extended 


application  of  the  mutual  information  to  a  more  general  system  to 


identify  unknown  parameters  is  tried  by  Weidemann  and  Stear  [51]. 


Later  with  the  help  of  measure  theory,  its  utilization  is  widened  into 


the  area  of  filtering  of  general  stochastic  systems  [45],  [46],  [52]- 


[54].  Here  an  attempt  is  made  further  to  apply  the  same  concept  in 


the  observability  problem.  The  main  feature  of  this  approach  lies  in 


the  transition  of  the  definition  of  the  term  "information"  from  Fisher 


to  Shannon,  i.e.,  the  meaning  of  information  here  is  understood  in  the 


sense  of  Shannon. 


Define  two  random  vectors  x  and  y  as 


x  *  <xl'  x2 . xn>' 


y  =  (yr  y2,  ■■■' 


and  assume  a  joint  density  p(x,y) ,  and  marginal  densities  p(x)  and 


p(y)  are  defined  as  usual.  Then  the  entropy  of  x,  H(x)  is  defined  as 


by  (3-4).  Entropy  of  y,  H(y)  is  defined  similarly 


H(y)  =  -E[lnp{y) ] . 


In  the  same  context  conditional  entropy  H(x|y)  can  be  defined  as 


in  [41],  [  51  ]  —  [  54  ] ,  i.e.,  for  a  given  conditional  density  p(x|y) 


and  chosen  specific  value  y  =  y  then 


H(x|{/)  =  -  /^?(x|y)lnp(x|{/)dx. 


(3-11) 


'.W 


.’-v< 


>  ’  •  » 


J 


1.1 

\\\v 

V-V7] 


►*VV' 


From  the  average  over  all  possible  y 


H(x|y)=  -  fjp(y )H{x|t/)dy, 

=  -  /Xjyp(£/)p(xjt/)liTp(x|i/)dxdy, 

=  -  /x<yP(x,y)lnp(x|y)dxdy, 

=  -E[lnp(x|y) ] .  (3-12) 

Next,  define  joint  entropy  H(x,y)  in  a  similar  way  as 

H(x,y)  =  -  /x  yP(x,y) lnp(x,y)dxdy, 

=  -E[lnp(x,y) ] .  (3-13) 

With  the  above  definitions,  mutual  information  between  x  and  y  is 
derived. 

Upon  the  definition  of  (3-1),  the  average  mutual  information  of  x 
for  specific  y  =  y  is  termed  as  conditional  mutual  information  [41) 
I(x,t/)  which  is  expressed  as 

i{x,y)  =  fyp[x\y)i{x,y)dx, 

p  (x\y) 

=  f  E>(x|t/)ln  - dx.  (3-14) 

P(x) 

I(x,tf)  is  the  measure  of  information  gain  which  is  provided  by  the 
measurement  y  =  y.  So,  averaging  of  (3-14)  for  all  possible  values  of 
y  yields  the  formal  definition  of  the  mutal  information  I(x,y)  [41], 


[45],  [51]-[54]  as 


H(x)  H(y) 


H(x,y) 


Figure  5*  Entropy  and  mutual  information 


I.e. ,  mutual  Information  is  the  common  portion  of  the  information  H(x) 
and  H(y) .  So,  it  is  clear  from  (3-15)  that  if  x  and  y  are 
independent,  i.e,, 

p(x|y)  =  p(x) , 

then,  I(x,y)  is  always  zero  due  to  ln(l)  =  0  and  no  cannon  portion  in 
Figure  5. 

1.  Properties  of  I(x,v) 

Mutual  information  has  the  following  important  properties; 

1)  I(x,y)  =  I(y,x)  >  0 

This  inequality  is  called  the  "Shannon  inequality."  Mutual 
information  is  always  greater  than  zero  except  the  case  where  x,  y  are 
stochastically  independent. 

2)  I(x,y)  >  I(x,  L(y) ) 

Sane  information  is  lost  by  the  transformation  L,  where  L(y)  is  any 
mapping  which  depends  on  the  domain  of  y.  Equality  holds  if  and  only 
if  the  mapping  is  one-to-one  and  onto.  Loss  of  information  depends  on 
the  relation 

H(y)  =  H(x)  +  E[ln|J|], 

v*iere  y  =  f(x),  J  =  Jacobian  of  f(x) 

3)  I(x,y)  >  I(z,  y),  (3-17) 

where  z  =  f(x,N),  N  is  a  randan  function  or  variable.  Information 


loss  is  incurred,  also,  due  to  the  random  term  in  the  transformation. 


4)  The  information  about  x  increases  monoton_ jally  as  more 
observation  is  taken,  i.e.. 


I(x1,  x^;  y1,  ....  yM)  <  Hxy  xk;  y1#  yM>  ^M+1  .  - )  / 


For  our  own  purpose  here,  the  first  equality  of  (3-16)  and  the 
property  4)  above  play  the  most  important  role.  (3-16)  is  used  to 
compute  mutual  information  between  x  and  y  by  considering  H(x)  as 
an  uncertainty  of  the  system  state  x  before  an  observation  is  made  and 
H(x|y)  as  the  uncertainty  of  x  after  an  obsservation  is  made.  Thus 
I(x,y)  is  interpreted  here  as  the  uncertainty  decrease  or, 
equivalently,  information  increase  due  to  the  observation.  Since  this 
uncertainty  difference  is  entirely  caused  by  the  observation  y,  the 
mutual  information  I(x,y)  can  be  used  as  the  measure  of  the 
observability  of  the  system.  The  increased  amount  of  information  due 
to  the  observation,  then  can  be  evaluated  using  the  inequality  (3-13). 
I.e.,  the  difference 


Z^i . V  *1 . V  W  “  I(X1 . V  yi . yM] 

is  the  information  change  or  information  rate  which  is  caused  by  the 
(M+l)-th  observation  data.  In  communication  theory  the  maximum  mutual 
information  over  the  p(x)  is  defined  as  channel  capacity  C, 


C  =  max(I(x,y) ) . 


(3-19) 


(3-25) 


H  (y)  =  l/21n[2ire(S4a^)]  . 

Thus,  from  (3-22),  (3-25)  and  the  definition  (3-16) 

Kx.t/)  =  H (y)  -  H(</|x)  , 

s  s 

=  l/21n(  1  +— 5)  =  l/21n(  1  +  — ),  (3-26) 

an  N 

where  N  is  the  noise  power.  Note  in  (3-26)  that  as  noise  power 
becomes  small,  mutual  information  increases  due  to  H((/|X)  decreasing. 
So,  the  output  y  approximates  the  input  x  more  exactly.  Oppositely, 
if  N-*  00  ,  i.e.,  the  input  is  totally  "masked"  by  the  noise,  then 

I(x.,y)  approaches  zero.  Then  x  and  y  look  like  independent  signals. 
No  information  about  x  is  tranferred  to  y  .  All  of  the  information  is 
lost  during  the  transmission.  It  is  clear  that  I (x,y)  increases  with 
increasing  signal  to  noise  ratio  (SNR) .  Since,  the  correlation 
coefficient  r,  in  this  case  is 


r2  = 


a? 


S 


S  +  N 


I (x  ,£/ )  can  be  obtained  in  terms  of  r  from  (3-26), 


However,  generalization  of  Shannon's  result  (3-15)  or  (3-16)  to 
the  continuous  random  process  needs  more  assumptions  on  the  measure 
theoretic  point  of  view.  This  is  discussed  next. 

First,  consider  that  the  observation  of  the  process  x^  which  is 
expressed  in  terms  of  the  Ito  stochastic  differential  equation 
(with  the  suppression  of  deterministic  control  u(t)) 

dxt  =  f(xt,t)dt  +  G(xt,t)dwt,  xtQ  =  xq  (3-28) 

is  made  through  another  stochastic  equation 

dyt  =  h(xt,t)dt  +  dvt,  (3-29) 

where  xteRn,  y^eR1” ;  f(.)  and  h{ . )  are  n,  m  dimensional  vector  valued 
functions,  respectively.  w^.  and  are  independent  Wiener  processes 
with  covariances  Q(t),  R(t)  independent  of  xt0-  G  is  an  appropriate 
dimensional  matrix.  Assume  (3-28),  (3-29)  satisfy  the  existence  and 
uniqueness  conditions  of  the  solution  in  the  mean-square  sense  [34], 
[36].  Let  (fl,F,y  )  be  a  measure  space.  Let  Y  =  C[0,T]  and  Fy  be  the 
family  of  Borel  sets  of  Y  and  Fy  be  non-decreasing  sub-o  -algebras  of 


FY 

generated  by 

<YS.  0  <  s 

1  A 

rt 

The  measure  induced  by  y  on 

the 

space  (Y,  Fy) 

is 

denoted  by 

yy  snd 

the  Wiener  measure  induced  by 

vt 

on 

(Y,  Fy) 

is 

denoted  by 

^  v' 

Let  X  be  the  vector  space 

and 

Fx  be  the  family  of  Borel  sets  of  X.  F^  is  also  a  nondecreasing  sub-a 
-algebras  of  F*.  Then  the  measure  induced  joint  measure  of  the 
joint  process  (x^,  y  )  is  defined  on  the  space  (X  x  Y,  F^  x  F^) . 


Further  assume  that 


oo 


a.s. 


(3-30) 


/  h(x  ,s)  h(x  ,s)ds  < 
s  s 

o 

Then  Gel'fand  and  Yaglon  [55],  Liptser  and  Shiryayev  [56],  Duncan  [45] 
proved  that  the  absolute  continuity 

U  y  «  yv  -  (3-31) 

n  <<  n  X  n  (3—32) 

^xy  Mx  Mv 

holds.  Further  it  is  known  that  [46],  [56]  equivalence  relation  of 

the  measures 


Hy  ~  yv 

^xy  ~  ^x  x  ^y  ~  ^x  x 

holds,  also.  If  once  absolute  continuity  condition  holds,  then  by  the 
Randon-Nikodym  theorem  [28],  [31],  [57]  there  exists  a  finite  real 
valued  unique  F-measurable  funciton  $  on  ft  such  that  for  every  AeF, 
e.g. ,  in  (3-31) 


y  (A)  =  /A4»1(a))dv^(oj) 


or  in  a  differential  form 


dyy 

(oi) - (<u) 

*  j 


dyv 


With  the  same  reason  for  the  (3-32) 


<j>9(w)  = 


d  y 


x,y 


(w)  . 


dy  x  dy 


(3-33) 


(3-34) 


(3-35) 


The  function,  known  as  a  likelihood  ratio,  plays  a  key  role  m  the 
derivation  of  mutual  information.  From  the  Camer on-Mar tin  translation 
theorem  [45],  [46],  [58]  for  the  system  (3-28)  and  (3-29),  likelihood 
ratio  becomes 


— ^(y)  =  exp{  /  h(x  ,s)T  if1  dy  -  1/2  /  E(x  ,s )T  R  1  (E  ,s)ds) , 

_  S  S  5  w 

°  o 


(3-36) 


^  t  _  ^  ^ 

X'Y  (x,y)  =  exp(/h(x  ,s)R  dy  -  1/2  h(x  ,s)  R  h(x  ,s)ds)  , 

-  Q  S  S  ^  ^ 


dy  x  dy 
x  v 


(3-37) 


where  h(x  ,s)  =  E[h(x  ,s)|Fy].  If  all  the  measures  considered  are 

s  s  s 


probability  quantitites  P^,  P^,  and  P^,  respectively.  Then  the 


Radon-Nikodym  derivatives  <p  1  and  <P2  become  density  ratios 


dP 


dP 


*1  " 


dP. 


dP  dP 
x  v 


So,  by  letting  <p  be 


Then,  from  the  definition  of  mutual  information  (in  Shannon  sense) 


I(xt  Yt)  =  /<Ji(xt,yt)ln<|)  (xt,yt)dPJ{dPy 

Since.  P^IXj.Yj)  =  Px|y(*tlYt)Py<Yt> 

dP  i  (x.  ly+.JdP  (y.) 

A,v  v  v  =  x|y  t|yt; 

<P  ( x . ,  y . )  , 

dPx(xt)dPy(yt) 

=  fV^ 

dPx(xt) 

So,  inserting  (3-40)  into  (3-39)  yeilds 


(3-39) 


(3-40) 


I(Vyt)  =  in 


dPx(xt) 


dPx(xt) 


dPx(xt)dPy(yt) 


(3-41) 


If  probability  density  is  used  instead  of  distribution  with  the 
notations 


px{xt)=  - I - '  VYt}  "  ~JL 


d  x 


d  y 


px!y(xtlyt) 


d  x 


(3-41)  becomes 


I(xt,yt)  =  /  px|y(xt|yt)ln 


pxlv(xtlyt) 

Px.(xt> 


py(yt)dxtdyt 
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=  E  In 


EWxtlvtl 


px(xtJ 


H(x  )  -  H(x  jy  ) 
t  t,Jft 


(3-42) 


Therefore,  to  compute  mutual  information  for  the  system  (3-28),  (3-29) 
one  is,  again,  required  to  know  either  two  densities  -  uncondditional 
and  conditional  -  or  two  entropies.  Next  is  a  brief  discussion  on  the 
solution  of  these  density  equations  and  approximation  methods  of  these 
densities  using  appropriate  moments. 


1.  p(xt)  and  two-moment  approximation 


Consider  the  sytem  equation  (3-28)  again 


dxt  =  f(xt,t)dt  +  g(x  ,t)dw  ,  xtQ  =  xq  . 


(3-43) 


Due  to  the  unknown  initial  state  xq  and  the  additive  noise  w  ,  the 


process  {x^}  can  only  be  described  by  the  statistical  treatment. 


As 


is  known  [36],  [57]  the  probability  density  evolution  of  p(xt)  obeys 


the  Kolmogorov  forward  equation 


3  p  n  a(pf.) 

- =  -  E  - —  +  1/2  E 


n  92<p GQGT) 


a  t 


(3-44) 


i=l  3^ 


i,j  axi3^ 
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where  all  the  arguments  in  the  expression  are  omitted  for  brevity. 
But  unfortunately  the  above  partial  differential  equation  can  be 


solved  only  for  a  few  spreclal  simple  case.  So,  in  many  practical 
problems  one  relies  on  an  alternative  approximation  approach  such  as 
state  estimation;  e.g.,one  obtains  proper  approximated  moments  of  the 


density  instead  of  the  density  itself.  Particularly  the  first  two 
moments  are  important  in  entropy  computational  purpose  even  though 
they  do  not  completely  characterize  density  p(x^ ) .  It  is  known  that 

A 

[36]  the  first  two  moments  mean  x*.  and  covariance  P.  propagate 
according  to 


Pt  =  E[f(xt,t)xtT]  -  E[f(xt,t)]xt1  +  E[xtf1(xt,t)] 

-  itE[fT(xt,t)]  +  E[G(xt,t)Q(t)GT(xt,t)],  (3-46) 


where  xt  =  E[xt|xs,  s<  t].  By  neglecting  third  and  higher-order 
moments  in  the  evaluation  of  (3-45)  and  (3-46),  one  obtains  the 

A 

following  approximated  version  for  x^  and  P^ . 


kt  =  f(xt,t)  +—  f^J^.,  t),  (3-47) 

Pt  =  fx(xt,t)Pt  +  Ptfx(xt,  t)  +  G(xt,t)Q(t)GT(xt,t) 

+  PtGx(xt,t)Q(t)G^(xt,t)  +  PtG(xt,t)Q(t)GxJ(xt,t) ,  (3-48) 

where  f  ( . )  and  G  ( . )  are  first  partial  derivatives  and  f  ( . ) ,  G  ( . ) 
x  x  xx  xx 

are  second  partial  derivatives  at  x^.  Further  if  the  second  partials 
of  (3-47),  (3-48)  are  negligible  compared  to  the  first  partials  and 


(3-49) 


G(.)  is  not  a  function  of  x^,  then 
xt  =  f(xt,t), 

Pt  =  fx(xt,t)Pt  +  Ptf^(xt,t)  +  G(t)Q(t)GT(t)  (3-50) 

which  is  a  commonly  used  approximation.  Of  course,  there  are  many 
other  algorithms  which  can  be  practically  useful  . 

2.  p(xt|y^_)  and  extended  linear  filter 

Conditional  density  p(xt|y^)  of  the  system  (3-28),  (3-29) 

satisfies  the  nonlinear  stochastic  partial  differential  equation, 
commonly  known  as  the  Kus’-ner  equation  [34],  [36] 

3  P  n  a  (pf  )  n  a  2 

- =  -  £  - L+  1/2  E  - (pGQGT)  +  (h(x.,t)  -  Eh(x.  ,t)}TR-I(t) 

3t  i=l  3  x  ^  i=13xi3xi 

(dyt  -  Eh(xt,t)dt)p.  (3-51) 

Due  to  the  additional  measurement -related  third  term  in  (3-51)  it  may 
be  more  complicated  to  solve  than  (3-44).  To  obtain  the  conditional 
moments  of  the  pdf  p(xt|yt)  of  (3-51)  let 

A 

Wxt)  =  E[*(xt)  |F*], 

then  any  conditional  moment  satisfies  the  stochastic  differential 
equation 

di  (X  )  =  {E[i^f]+l/2  tr[E(GQGT^  )])dt+<Efl>h]  -i?  h)TR_1  (dy  -hdt), 


where  h  =  E[h(x  , t)  J Ff, ]  and  ^  ,  y  _  are  the  first  and  second  partial 

derivatives  of  ip  relative  to  x^.,  respectively.  By  letting^  (x^)  =  x^ 
T 

and  ijj(x  )  =  x^x^.  obtains  the  mean  and  covariance  as 
dxt  =  f (xt,t)dt+{E[xthT(xt,t) ] 

-xtE[hT(xt,t) ] }R"{t) {dyt-E[h(xt<t) ]} .  (3-53) 

dPt  =  {E[(xt-xt)fT]+E{f{xt-^t)T}+E[GQGT]-E[(xt-xt)hT]R_1{t) 

E[h(xt-Xt)  T]}dt+E[(xt-xt)(xt-xt)T(h-Eh)T]R'1(t){dyt-Ehdt)/  (3-54) 

Since,  P  is  a  function  of  the  higher-order  moments  it  can  not  be  a 
finite-dimensional  filter  in  general.  So,  various  approximations  and 
assumptions  are  made  to  ensure  that  (3-53),  (3-54)  to  be  finite 

dimensional  and  practically- implementable  filter  algorithms.  If, 

again,  G(.)  is  a  function  of  only  t,  and  the  first-order  expansion  of 
f(.)  and  h( . )  is  made,  then  (3-53),  (3-54)  reduce  to  the  well  known 

extended  Kalman  filter 

=  f(xt,t)dt  +  Pth^R_1(t) [dyt-h(xt,t)dt] ,  (3-55) 

Pt  =  fx(xt,t)Pt+Ptf^(xt,t)+G(t)Q(t)GT(t)-Pth^R_1(t)hxPt, 

where  f  =  3  f  I 

3  xt  |  xt  =  xt  , 


h  =  3  h  I 

X  -T - 

3  *  J  x* 


x*.  . 


(3-56) 


The  Kalman-Bucy  filter  is  obtained,  of  course,  if  the  system  and 
measurement  equations  are  linear.  Depending  on  the  order  of  the 
expansion  of  f ( . )  and  h( . ) ,  second  or  even  higher-order  filters  can  be 
derived. 

Notice  here  that  the  utilization  of  any  approximated  moment 
expressions  of  the  density  instead  of  the  density  itself  incurs  the 
conceptional  change  of  the  mutual  information  from  Ifx^.y^)  to 
I(xt>yt),  where  xt=E[xt |F^] .  In  the  next  section,  the  second-order 
moment  approximation  of  the  density  functions  p(x^)  and  p(xt|yt)  will 
be  discussed  in  the  computation  of  the  mutual  information  I(xt,yt). 
Before  this,  the  relationship  between  the  Shannon  and  Fisher 
information  will  be  summarized  for  the  stochastic  system  instead  of 
the  random  variable  case.  The  following  are  the  vector  version  of  the 
results  of  Liptser  and  Shiryayev  [56] . 

3.  Relationship  between  Shannon  and  Fisher  Information. 

Consider  the  general  nonlinear  stochastic  system  as  in  (3-28), 
(3-29).  Nonlinear  functional  dependence  of  f(.)  and  G( . )  in  terms  of 
xt  makes  the  derivation  of  any  relationship  between  the  two 
information  concepts  very  difficult.  This  difficulty  can  be  avoided 
if  a  specific  form  of  nonlinear  system  is  assumed,  for  example,  linear 
dynamics-nonlinear  observation  system.  In  this  case, system  is  given  as 

dxt  =  f(t)xtdt  +  g(t)dwt, 

<3y*  =  h(x.  ,y.  ,t)dt  +  dv.  . 


(3-57) 

(3-58) 


Note  that  h( . )  can,  also  be  a  function  of  the  observation  y^  itself 
under  the  bounded  strong  solution  condition  for  t  and  the 
nonanticipativeness  for  y^.  Assume  further  that  h( . )  satisfies 


/  [hT(xt,yt,t)h(xt<yt,t) ]dt<°°  ,  (3-59) 

*0 

for  each  t,  t^tjCT,  and  two  densities  p(x^)  and  p(xt)y+)  are  twice 
continuously  differentiable  with  respect  to  x. . 

Then  Fisher  and  Shannon  information  has  the  following  relation  ' 


I(xt,yt)=Io(xt,yt)-l/2/tr{g(s)gT(s)[EJ(xs,ys)-J(xs)]}ds,  (3-60) 

^o 

where  I^x^.y^)  is  Shannon  information  quantity  due  to  the  observation 
equation  (3-58)  only,  i.e.,  the  case  where  statistical  uncertainty  of 
the  process  (xt)  is  not  considered,  and  Jfx^y^) ,  J(x^_)  are  Fisher 

information  quantities  corresponding  to  the  densities  p ( x^ | y ^ )  and 
p(x^) ,  respectively.  Iq ( xt , y^ )  can  be  expressed  according  to  [45], 
[46],  [56]  . 


I0(xt,yt)=l/2tr.  /E([h(xs,ys,s)-h(xs,ys,s]  [h(xs,ys,s)-h(xs,ys,s) ] )ds, 
°  (3-61) 


where 


h'  (x.  ,y.  ,t)  =  Ethfx^y^t)  IF^} . 


The  proof  of  the  relation  (3-60)  can  be  found  in  the  cited 


reference. 


3.4  Observability  using  mutual  information 

As  mentioned  before  the  mutual  information  U.x^,Yt)  is  the 
information  contents  (in  Shannon's  sense)  about  the  state  x^  which  is 
contained  in  the  observation  y  ,  i.e.,  the  carmen  information  of  the 
two  processes  x^  and  y^.  So,  once  it  is  computed  then  it  represents 
the  "tightness"  of  the  connection  of  the  state  to  the  observation 
y^.  Hence,  it  might  be  sued  as  a  criterion  to  determine  the  degree  of 
the  observability  of  the  given  stochastic  system.  The  term 
"observability"  here  is,  of  course,  vised  in  a  different  meaning  from 
the  deterministic  case  and  even  different  with  the  traditionally  used 
stochastic  case  where  the  Fisher  information  is  commonly  used. 

As  the  Fisher  information  matrix  and  the  observability  matrix  is 
practically  used  together  in  a  traditional  observability 
determination.  Shannon's  mutual  information  and  the  term 
"observability"  will  be  used  together  henceforth. 

But  due  to  the  difficulty  in  solving  the  exact  density 
equations,  Kolmogorov  forward  equation  and  Kushner  equations, 
approximated  moment  expressions  are  utilized,  alternatively. 

Before  this,  former  results  on  stochastic  system  observability 


are  summarized  next. 


1.  Former  results  on  stochastic  system  observability. 

Consider,  again,  a  general  form  of  stochastic  system  (3-28),  (3- 
29) 


dxt  =  f(xt,t)dt  +  G(xt,t)dwt,  (3-62) 

dyt  =  h(>:.,t)dt  +  dvt.  (3-63) 

The  traditional  approach  in  the  determination  of  the  observability  of 
the  system  (in  a  Fisher  sense)  is  as  follows:  using  the  likelihood 
function  A  with  A  =  p(xt|yt)  for  the  noiseless  system  (Q(t)  =  0)  in 
(3-28),  (3-29),  it's  logarithm  quantity  In  (A)  is  maximized  according 

to  the  definition  of  the  Fisher  information 

32lnC  .A  ) 

j  =  _E[ - —  ]  .  (3-64) 

3  x^a  x£ 

Then,  the  Fisher  information  matrix  J(t,tQ)  for  the  first-order 
approximation  of  the  system  about  the  estimation  x^  is  obtained  as 
[ 34 ] — [ 36 ] ,  [38],  [46],  [47] 

J(t,tQ)  =  ^I'(t0,t)P“1  *(tQ,t)  +  /  $T(s,t)HTR_1H$  (s.t)ds  , 

*0 

=  Ji(t,tQ)  +  Jo(t,tQ),  (3-65) 


where  <i>  (.)  is  transition  matrix  for  the  linearized  portion  of  f(.)  at 


3  $ 

- (t,s)  =  F(t)*(t,s).  *(t,t)=I, 

3  t 
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(3-66) 


F(t) 


H(t) 
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and  J.  is  due  to  the  initial  information  P  1  and  J  is  due  to  the 
l  o  o 

observed  information,  respectively.  Or  after  some  algebraic 
manipulation  recursive  version  of  (3-65)  is  obtained  as 


dJ(-)  T 

- -  -FA(t)J(.)-J(.)F(t)+H 

dt 


(t)R  1(t)H(t). 


(3-67) 


Traditionally  Jq( . )  is  called  an  observability  matrix  (some  authors 
[34],  [36]  call  it  an  information  matrix.).  Then  positive 
definiteness  or  nonsingularity  of  Jq(.)  is  used  as  a  criterion  of  the 
determination  of  the  observability  for  the  system.  Or,  for  some 
positive  constants  a,  8  »  s ,  and  unit  matrix  I,  the  relation 


0  <  al  <  JQ(t,t-s)  <8  I,  (3-68) 

is  checked  for  all  t  >  tQ+s  [36]. 

However,  the  Fisher  information  matrix  J  is  related  to  the 
estimation  error  covariance  matrix  P^.  by  [47] 


P 


(I 


3b  3b 

+ - )J  A(I  + - 

3  x  3  x^ 


T 


(3-69) 
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with  b{t)  being  a  bias  of  x^  with  respect  to  x^.  If  x^  is  an  unbiased 
estimator,  then 

Pt  >J_1 ,  (3-70) 

Further,  if  xt  is  optimal,  then  the  equality  in  (3-70)  holds.  I.e., 
the  covariance  of  the  individual  state  estimation  error  is  lower 
bounded  by  the  diagonal  elements  of  J  1  which  is,  so  called,  Cramer- 
Rao  lower  bound. 

As  well  as  the  positive  definiteness  of  the  observability  matrix, 
eigenvalues  of  this  matrix  are,  sometimes,  utilized  to  test  the  system 
observability  [37].  Appearance  of  any  zero  eigenvalue(s)  means 
singular  Jq  and  causes  system  unobservabiiity.  High  stiffness  between 
the  eigenvalues  means  weakly  observable.  Condition  number  q  of  Jq 
e 

q  =  -  , 

s 

where  e  and  s  are  maximum  and  minimum  eigenvalues,  respectively, 
is  used  as  an  indicator  of  the  system  observability. 

Somewhat  different  approach  is  studied  by  Sunahara  [59] .  A 
stochastic  system  (3-62),  (3-63)  is  said  to  be  observable  if  there 

exists  an  estimator  such  that  the  associated  error  converges  to  a 

sufficiently  small  value  on  the  time  interval  [tQ,  tj]  in  some 
stochastic  sense,  i.e.,  for  the  preassigned  error  constants  6  and  e  , 

0  <  e <  1  if 

P(l  I*.  “  x  1 12  >  6)  <  £  , 


(3-71 


is  satisfied,  then  the  system  is  said  to  be  stochastically  observable. 

A 

Here  x^_  is  obtained  by  the  pre-assigned  filter  form 

dxt  =  f(iit,t)dt+PtHT(t)[dyt-h(xt,t)dt],  (3-72) 

for  the  appropriate  dimensional  matrices  P  and  H(t). 

Even  though  the  Fisher-information  approach  is  most  widely  used 
in  the  observability  determination  of  the  given  stochastic  system, 
several  disadvantages  can  be  indicated  when  compared  with  the 
Shannon's  mutual-information  approach. 

1)  Even  though  the  theoretical  definition  of  the  Fisher 
information  (3-64)  can  accommodate  system  noise  w^,  the 
practically  used  form  (3-65)  does  not  accommodate  w^  as  far  as 
the  likelihood  function  which  is  chosen  as  the  conditional 
density  P(x^  |  Yt) .  Neglect  of  the  system  noise  may  cause 
incorrect  results  when  w^_  is  significant  compared  to  the  other 
noises  [39],  A  convenient  form  to  handle  both  system  and 
observation  noises  is  not  yet  available.  However,  the  mutual 
information  conveniently  considers  both  noises  simultaneously 
since  it  always  requires  both  densities  p(xt|yt)  and  p(xt) 
together  from  its  definition. 

2)  If  the  system  is  unobservable  or  marginally  observable,  then 
singularity  or  almost  singularity  of  the  observability  matrix 
makes  it  very  difficult  to  compute  this  matrix,  practically. 
But  this  problem  does  not  occur  in  the  mutual  information 
computation  as  can  be  seen  in  the  next  subsection. 


3)  Extending  linear  results  to  the  general  nonlinear  case 


requires  many  approximations-  In  the  nonlinear  case  ,a 
general  form  of  transition  matrix  does  not  exist.  ifx^y^) 
requires  many  approximations  to  be  practically  implementable, 
but  here  one  can  use  many  well-developed  nonlinear  filters 
which  are  already  publicaly  available. 

4)  Even  with  the  above  problems  in  the  Fisher  information 
approach,  simplicity  in  the  calculation  and  recursive  nature 
make  it  popular  in  the  linear  or  linearized,  negligible  system 
noise  applications. 


Observability  computation  using  mutual  information. 


From  the  discussion  of  the  previous  section,  computation  of 
observability  in  terms  of  mutual  information  may  be  found  conveniently 
by  an  approximated  filter  algorithm  in  many  cases.  From  (3-42), 
I(xt,yt)  computation  requires  two  entropies  -  marginal  entropy  H(xt) 
and  conditional  entropy  H(x^|yt) .  Both  entropies  can  be  computed  from 
the  relations 


H(xt)  =  n/2  InA  +  1/2  ln(detl^),  (3-73) 

H(xt|yt)  =  n/2  InA  +  1/2  ln(det  pj) ,  p£q  =  p£q,  (3-74) 

where  r  ^  is  the  covariance  for  the  marginal  density  p(x^)  and  P t  is 
the  covariance  for  the  conditional  density  p(x^.|y^).  Note  superscript 


T  is  not  a  transpose  here.  No  approximation  is  assumed  in  both 
T 

P..  Therefore,  from  insertion  of  (3-73),  (3-74)  into  (3-42) 
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Inequality  in  (3-79)  is  due  to  the  information  loss  which  is  incurred 
during  the  approximation  procedures.  For  the  observability  of  the 
individual  state,  say  i-th  state,  the  following  will  be  used 


Til 


I.(x.,y.  )  =  1/2  In  ( - ),  i  =  1,2, 

lie  p 

ii 


,  n, 


(3-80) 


where  xi  is  the  i-th  element  of  x^.  and  are  diagonal  elements 


of  T  and  Pt<  respectively.  Of  course,  Ft  and  Pt  are  computed  as  a 
part  of  state  estimation.  Thus,  both  are  defined  only  when  they  are 
positive  definite.  The  degree  of  observability  at  time  t  is  easily 
computed  by  reading  rt»  Pt  and  simple  computation  according  to  (3-78). 

From  (3-78)  it  is  clear  that  for  I(xt,yt)  to  be  maximum ,  must 


be  minimum. If  the  minimum  covariance  P  of  the  estimation  error  is 


obtained  by  the  unbiased  optimum  estimator , then  the  maximum  Fisher 
information  is  obtained,  also  [47],  i.e.,  Cramer-Rao  lower  bound  is 
obtained  in  this  case.  So, 


*  -1 
Pt  >  pt  =  J 


(3-81) 


To  observe  observability  variation  due  to  F  ^  and  pt  changes, 
consider  the  simple  linear  system 


dxt  =  F(t)xtdt  +  G(t)dwt, 
dyt  =  H(t)xtdt  +  dvt, 


(3-82) 

(3-83) 


where  wt  and  vt  have  strength  Q(t)  and  R(t),  respectively. 


Covariances  pt  and  F^. ,  then,  satisfy 


r  =  F( t)  r  +  r^FT(t)  +  c(t)Q(t)GT(t) ,  r.„  »  r , 


(3-84) 


Mutual  information  change  in  this  system  arises  in  two  ways.  One  is 
through  initial  information  rto<  ^o’  an^  another  is  through 
measurement  mechanism  H{t) . 

Even  assuming  the  same  initial  information  such  that  T  =  P^.  , 
the  magnitude  of  or  plays  an  important  role  at  the  final  time. 
For  example,  a  large  initial  covariance  make  system  observability  grow 
fast  at  the  initial  stage  since  P  in  (3-85)  tends  to  decrease  rapidly 
to  its  steady  state  if  the  filter  works  properly.  The  main  reason  for 
this  is  due  to  the  last  term  of  (3-85).  However,  rt  does  not  change 
rapidly  since  there  is  no  such  term  in  (3-84).  Some  guidelines  of 
choosing  proper  initial  covariance  in  simulation  can  be  found  in  [60]. 
But  choosing  of  specific  value  of  P^o  is  based  on  the  designer’s 
"degree  of  confidence"  of  x^  relative  to  unknown  true  value  x^,  in 
most  cases.  If  too  optimistic  (choosing  too  small  P^o  by 
overconfidence) ,  then  information  growth  may  be  very  slow  even  in  the 
case  where  the  system  is  deterministically  observable.  So,  tuning  of 
the  filter  is  compromising  between  two  extremes  by  trial  and  error 
until  obtaining  desirable  performance. 

The  effect  of  measured  information  on  observability  is  seer,  also 
through  the  last  term  P^H^(t)R  1(t)H(t)P^.  in  (3-85).  Especially 
measurement  structure  matrix  H(t)  and  noise  strength  R(t)  are 
important  here.  So,  if  this  term  is  negligible  due  to  some  reason,  for 
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example,  R(t)->°°  and/or  H(t)->0,  then  the  changing  rate  r  and  P  in 
(3-84),  (3-85)  will  be  almost  the  same.  Thus,  mutual  information  or 

observability  will  not  grow  any  more  in  this  case. 

A  short  discussion  of  the  relation  between  the  deterministic 
observability  condition  and  the  mutual  information  for  the  linear 
system  case  is  made  next. 

3.  Linear  systems:  deterministic  and  stochastic  observability. 

Mutual  information,  or  formally,  stochastic  observability  of  a 
system  is  approximated  as  the  log  ratio  of  the  two  covariances  rt  and 
P^.  So,  the  relationship  between  deterministic  and  stochastic 
observability  is  characterized  by  the  relation  between  these  matrices 
and  the  satisfaction  of  the  deterministic  observability  condition.  To 
avoid  complexity  consider  a  stochastic  linear  (time-invariant)  system 

dx^  =  Ex^dt  +  gdw^,  (3-86) 

dy^  =  Hx^dt  +  dv^  ,  (3-87) 

where  w^,  have  covariances  Q,  R  respectively.  For  this  system  a 
theorem  is  cited  from  [56]. 

Theorem  3-1 

Let  the  system  (3-86),  (3-87)  satisfy  the  deterministic 


observability  condition,  i.e.,  observability  matrix 
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(3-88)  ! 
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has  rank  n.  Then  the  covariance  matrix  of  the  system  is  uniformly 
bounded  and  converges  to  its  limit  ^  ,  where 

Pa,  =  lim  P  , 
t->®  1 

=  lim  E[(xt-xt)(xt-xt)T],  (3-89) 

t->  OO 

is  the  solution  of 

SP„  +  ?oFT  +  GQGT  -  PaoHTR-1HPa>  =  0.  (3-90) 

Uniform  boundedness  is  proved  by  changing  the  system  dynamic 
equation  (3-86)  into  its  auxiliary  control  problem. 

Remarks 

1)  For  uniform  boundedness  and  convergence  of  P^  to  Poo  at  least 
an  unstable  state,  if  exists,  must  be  observable 
deterministically  [61].  If  the  system  is  stable  then  the 
observability  rank  condition  (3-88)  can  be  dropped  for 


V 


H 

HF 


HF11' 


-1 


boundedness  of  P. . 


2)  If  matrix  pair  (F,G)  in  (3-86)  constitutes  a  controllable 
system,  i.e.,  controllability  matrix  M  (when  w^_  is  considered 
as  an  input  control ) . 

M  =  (G,  FG . .  Fn~^G) , 

has  rank  n,  then  P  is  positive  definite  [36],  [56]. 

From  the  previous  theorem ,  the  following  result  can  be  proved. 
In  the  theorem,  covariance  matrix  manipulation  identities  are  cited 
from  the  results  of  Balakrishnan  [61]. 

Theorem  3-2 

If  the  time  invariant  linear  system  (3-86),  3-87)  is 

deterministically  observable,  then  it  becomes  stochastically  more 
observable  in  the  sense  that  the  mutual  information  I(xt,yt)  increases 
with  time. 

Proof 

By  Theorem  3-1  covariance  which  is  the  solution  of 

Pt  =  FPt  +  PtFT  +  GQGT  -  P  HTR-1HP  ,  P  =  P  ,  (3-91) 

converges  uniformly  to  PM  if  the  system  is  deterministically 
observable.  Now  consider  covariance  where 

ft  =  Frt  +  rt  ft  +  gqgt,  r  =  pq  •  (3-92) 

We  want  to  show  next  the  relation  f  >  P  .  Differentiation  of  (3-92) 


and  the  solution  of  (3-93)  becomes 


rt  =  4»(t)  ro4>  (t). 


(3-94) 


where  «}>( t )  =  F4>(t),  <$>(0)  =  1  . 

Using  the  same  procedures  as  (3-91)  gives 


“  T  -1  *  *  T  -1  T1 

Pt  =(F  -  P  H  R  H)P  +  P  (F  -  P  H  R  H)1  , 


and  it's  solution 


Pt  =Mt)P/(t), 


kt)  =  (F  -  PtHTR_1H)i|>(t),  ij,  (0)  »  I 


(3-95) 


(3-96) 


(3-97) 


(3-98) 


rQ  in  (3-94)  and  Pq  in  (3-97)  are  determined  by  letting  t=0  in  (3-92) 
and  (3-91),  respectively.  Let  eigenvalues  of  F  in  (3-95)  be  ,  X 2> 

such  that 

X  1  >  X2  > . .  >  *n, 

T  -1 

and  eigenvalues  of  (F-P.H  R  H)  in  (3-98)  be  n,,  p„,  .  ...,p  such 

t  r  l  c  n 

that  p1  >  p2 . .  >  pn,  then,  due  to  the  term  P^H^R  1H  in  (3-98)  the 

relation 


holds.  Considering  (3-94)  and  (3-97)  and  both  having  the  same  initial 
conditions 

D(t)  =  r  -  P  >  0.  (3-99) 

Further  the  difference  D(t)  is  monotone  in  time  since  all  the 
eigenvalues  appear  as  an  exponential  form  in  $  and  ^  by  the  Caley- 
Hamilton  theorem.  So,  convergence  of  Pt  to  Pro  and  monotonicity  of 
D(t)  says  that  I(xt,yt)  grows  monotanically  from  (3-78).  Thus,  the 
system  becomes  more  observable  as  time  progresses.  ** 

More  intuitive  relations  of  the  two  observability  concepts  can  be 
derived  when  absence  of  process  noise  wt  is  assumed.  In  this  case, 
using  a  matrix  inversion  identity  (62]  for  (3-91) 

P"1  =  -F^P"1  -  P~*F  +  HTR_1H,  P~*  =  P"1,  (3-100) 

Then,  the  solution  of  (3-100)  is 

P^1  =  «|»T(t0.t)P"14>(t0,t)  +  /JT(s,t)HTR_1H4>(s,t)ds  , 


-  JJt,t  )  +  J  (t,t  ). 


(3-101) 


as  the  Fisher 


Notice,  that  p"1  in  (3-101)  is  exactly  the  same 

information  matrix  J(t,t  )  in  (3-65).  Using  the  same  procedures  for 

o 

the  covariance  T  in  (3-92),  yields 

*  -FTrt_1  -rt_1F,  r^1  =r'1  =  p'1,  0-102) 

and  it's  solution 

r"1  =  «|>T(to,t)  ro_1  «>(to,t).  0-103) 

Assume  here  that  P”1  is  nonsingular,  i.e.,  there  is  seme  prior 
information  about  all  states.  If  the  system  is  deterministically 
observable,  i.e., the  second  term  of  (3-101)  is  positive  definite,  then 
comparison  of  (3-101)  and  (3-103)  considering,  again,  the  definition 
of  (3-78), shows  that  I(xt>yt)  increases  until  Pt  reaches  to  its  limit. 

Now  consider  there  exists  system  noise  .  Then  from  (3-92)  its 
solution  is 

rt  =  <j>(t,to)ro<)>T(t,to)  +  ;<).(t,s)G(s)Q(s)GT(s)(|.T(t,s)ds, 

*o 

-  C^t,^)  +  Cc(t,to).  (3-104) 

Notice  that  the  matrix  Cq( t, tQ)  is  termed,  traditionally,  as  a 
stochastic  controllability  matrix.  So,  from  (3-101),  (3-78),  the 

classical  concept  of  stochastic  observability  and  controllability 
affect  the  mutual  information  as  follows:  is  increased  by 

both  increased  quantity  of  controllability  and  observability. 
Contribution  of  the  stochastic  controllability  matrix  Co(t,tQ)  is  made 
via  increasing  I\  in  (3-104),  and  thus  increasing  I(x  ,y  )  in  (3-78). 


CHAPTER  4:  INFORMATION  STRUCTURAL  ANALYSIS  OF 
BOT  AND  ARRAY  SONAR  SYSTB4S 

Simulation  results  of  the  information  structured  analysis  of  two 
important  examples  of  nonlinear  stochastic  systems  are  presented  here. 
System  models  are  taken  as  the  same  underwater  tracking  problems  as  in 
Chapter  2  to  relate  with  the  deterministic  observability  conditions. 
To  fit  more  practical  situations  in  both  BOT  and  array  SONAR  tracking 
examples,  it  is  assumed  that  the  information  acquisition  about  the 
system  states  is  made  through  the  discrete  measurement  mechanism. 
However,  the  evolution  of  the  system  states  are  assumed  to  be  the 
time-continuous  .  Thus,  the  estimation  of  the  system  states  are 
implemented  by  the  discrete-observation,  continuous-state  filter 
algorithm. 

Before  presenting  this,  the  following  simple  linear  system 
results  are  provided  to  give  a  clear  understanding  of  the  current 
approach. 

The  term  "observability”  in  this  chapter  ,  of  course  ,  means  the 
degree  of  the  observability  in  terms  of  mutual  information. 


.1  Falling-body  example. 


Initial  position  z{0) 
- —9 - 

1 

I 

.  I 

l 

position 

— — r - 

(velocity 
I  z(t)=x2(t) 

z(t)=x1(t)  | 


Measurement 


This  system  is  observed  by  the  noisy  measurement  device  which  can  be 
expressed  by 

Yt  =  Hx(t)  +  vt,  (4-3) 

where  randan  white  Gaussian  noise  has  covariance  R(t) .  Simple  test 
shows  that  the  deterministic  portion  of  the  system  is  observable  if 
one  observes  position  x^  and  unobservable  if  observes  velocity  x^ . 
Intuitively  this  is  clear  because  if  one  measures  ,  than  it's 
derivative  gives  velocity  x^.  No  other  information  is  required  to 
describe  the  system.  However,  when  one  measures  velocity  x^,  then  as 
integration  is  required  to  get  position  x^.  I.e., 

t 

x  (t)  =  ;x  (t)dt  +  x  (0)  f  (4-4) 

o 

tut  x1  (0)  can  not  be  determined  from  any  measurement  data.  So,  the 
system  is  unobservable  in  this  case. 

Using  the  usual  Kalman-Bucy  filter  with  Gaussian  noise,  mutual 
information  Ifx^.y^)  is  compared  in  Table  2.  In  the  deterministically 
observable  case  (by  measuring  position  x^ )  mutual  information  of  the 
total  system  (T  in  Table  2)  grows  up  to  5.7  from  zero  at  final  time  20 
sec.  Position  (p)  and  velocity  (v)  grow  4.9  and  1.8  respectively. 
But  for  the  unobservable  case  (by  measuring  x^)  corresponding 
observability  grows:  T  =  2.8,  P  =  2.3,  v  =  1.8.  To  compare  the 
significance  of  the  logarithm  scale  a  linear  scale  is  also  shown.  For 


the  unobservable  system,  only  the  observed  velocity  variable  keeps  the 
same  level  of  observable  system  (1.8). 

The  degree  of  observability  directly  affects  the  filtering  error. 
This  Is  analyzed  In  Figures  8  and  9 .  Figure  8  shows  the 
deterministically  observable  case  with  Initial  errors  of  20  m  in 
position  and  5  m/s  in  velocity.  Since  position  is  measured  In  this 
case,  its  Information  is  dominant  and  thus  the  corresponding  error 
decreases  rapidly.  The  velocity  error  Is,  also,  quite  small  at  the 
final  time  since  is  also  an  observable  variable.  However,  Figure  9 
is  much  different  than  Figure  8  even  with  the  same  initial  errors. 
Since  velocity  is  observed  here,  position  is  an  unobservable  variable, 
and  thus  carries  very  large  errors  up  to  the  final  time.  The  velocity 
variable  (observed  quantity  here)  shows  quite  satisfactory  performance 
compared  to  the  position  variable. 

Table  3  shows  the  effect  of  initial  information  P  (=r  )  on  the 

o  o 

observability+  and  filtering  error.  In  general,  as  larger  initial 
information  is  assumed  (smaller  Pq)  the  system  obtains  smaller  final 
information.  Note  also  that  in  most  cases  information  acquisition  is 
quite  fast  in  the  ini  tied  stage.  This  phenomenon  is  more  significant 
as  Pq  increases.  It  implies  that  the  filter  forgets  the  initial 
uncertainty  very  quickly  when  the  assumed  initial  information  is 
small.  This  is  one  of  the  most  desirable  features  of  the  Kalman-Bucy 
filter.  Practical  experience  suggests,  however,  that  in  stochastic 
nonlinear  filter  design,  with  non-negligible  nonlinearity,  it  is 
desirable  not  to  use  overly  pessimistic  initial-error 
+Observability  again  refers  to  I ( , y^)  for  all  the  following  data. 


OBSERVABI 


AVERAGE 


UNOBSERVABLE 


Observable 


Average 


covariances  since  a  large  Pq  could  excessively  dampen  the  system 
dynamics  and  filter  gain  matrix  and  thus  reject  some  of  the  valuable 
measurement  data  in  spite  of  fast  information  pick  up  from  the 
measurement  mechanism[64] .  This  phenomenon  can  be  found  in  the 
position  error  (ep)  when  the  system  is  deterministically  unobservable 
with  a  high  value  PQ.  An  opposite  direction,  i.e.,  overly  optimistic 
P  ,  sometimes,  makes  the  response  of  the  filter  too  slow. 

As  a  summary,  system  observability  is  strong  with  strong  position 
and  velocity  observability  when  the  system  is  deterministically 
observable.  But  it  is  weak  when  the  system  is  deterministically 
unobservable.  Since  position  is  an  unobservable  state  in  the  latter 
case,  its  poor  observability  generates  large  filtering  errors  during 
the  observed  period. 

4-2  BOT  system  and  Information  analysis. 

It  is  well  known  that  a  BOT  system  is  observable  only  when 
relative  maneuvering  exists.  It  is  checked,  again  in  Chapter  Two. 
using  so  called,  mixed-coordinate  system  (see  also  [33],  [63]).  Here 
the  same  problem  is  used  to  analyze  and  compare  the  observability 
content  in  terms  of  the  information  theoretic  point  of  view.  For 
comparison,  two  more  popular  coordinate  systems  -  rectangular  and 
modified  polar  (MP)  coordinates-  are  adopted  in  this  section.  System 
description  of  the  individual  coordinates  are  presented  in  Table  4, 
with  proper  dimensional  noises.  Measurement  equations  are  written  in 
discrete  form  for  future  conveniences.  Using  the  same  procedures  as 


derived  in  Chapter  Two,  deterministic  observability  for  the  remaining 
two  coordinates  can  be  checked.  Lang  algebraic  manipulation  shows 
also  that  the  system  is  observable  when  relative  maneuvering  exists. 
This  is  not  surprising  since  deterministic  observability  is  not 
affected  by  the  coordinate  transformation. 

Note  in  Table  4  that  the  system  equation  is  linear  and  the 
observation  equation  is  nonlinear  in  rectangular  coordinates  and  vice 


versa  in  the  modified-polar  and  mixed  coordinates.  The  variables  r, 
v,  a,  s  represent  range,  velocity,  acceleration,  bearing. 


respectively. 


Rectangular  |  Modified  polar  I  Mixed 


tan  (  _  )  +  v.  =  [  0  0  1  0  ]  x.  +v, 


To  implement  a  not-excessively-complicated  nonlinear  filter  of  a 
continuous-system  discrete-observation  type, a  truncated  -  second  order 
filter  [34],  [36]  is  considered.  With  the  same  target  and  observer 

(or  ownship)  configuration  as  in  Figure  2,  one-directional  maneuvering 
is  assumed  as 

ax(t)  =  0  , 

a  (t)  =  -0.25  cos  (0.005t)  m/s2  .  (4-5) 

and  initial  states  are  assumed  Gaussian  with  proper  mean. 

Other  parameters  used  sure 

T  (Sampling  interval)  =  10  sec, 

At  (time  update  interval  between  observation)  =  1  sec, 
r(0)  (initial  range)  =  8000  m, 

v^  (target  vel .  in  x-direction)  *  lOm/s  ~  20  kt  , 

VTy  =  0 

v^  (observer  vel.  in  x-direction)  =15  m/s  ~  30  kt  , 
v^  =  5  sin  ( 0 . 005t )  m/s. 

Measurement  noise  sequence  and  system  noise  are  assumed  to  be,  also, 
Gaussian  with  variance  and  Q(t),  respectively. 

Under  the  assumption  of  near  symmetric  form  of  density  and 
negligible  third  and  higher-order  moments,  a  modified  version  of  the 
truncated  Gaussian  second-order  filter  is  implemented. 


Continuous-discrete  type  filter  is,  commonly,  implemented  in  two 
stages.  The  first  stage,  a  measurement  —  update  stage,  processes 
observed  data  according  to  the  discrete  filter.  The  second  stage 


Ill 


performs  the  time  propagation  integral  of  the  first  and  second  moment 
(or  higher  moments  if  necessary)  of  the  state  between  the  observation 
interval  according  to  the  continuous  fashion. 

This  form  of  filter  is  particulary  suited  to  us  due  to  the  nature 
of  the  underwater  SONAR  system  where  the  data-acquisition  interval  is 
quite  long  compared  to  the  data-processirg  rate.  The  actual  algorithm 
is  summarized  next  [34]. 


1 ,  Measurement  Update 

At  the  sampling  instant  t^,  abbreviated  by  k,  mean  and 
covariance  are  computed  as 


vi  =  xk*  Vyk  -  h(vk)  -  v*'1' 

pm  ’  pk  -  V<Vk>pk' 

where  gain  is  given  by 

Kk  =  p/£k.k)A'£. 

\  -  H(Vk>pk»T<Vk>  -  6m  ^  +  \  . 

/v  3  h 

H(xk,k)  = 


3  x 


*  ,  h  is  measurement  function, 

x*xk 


(4-6) 

(4-7) 


(4-8) 

(4-9) 


and  where  the  bias  correction  term  b  is  an  m-vector  with  i-th 

m 


component 


b  (k)  =  0.5tr{ 
mi 


3x3xTPk) 


X=x. 


,  i=l,2, . . . ,m, 


(4-10) 


m  is  a  measurement  dimension. 


L2 


2.  Time  propagation  between .  observations 

Between  observation  intervals  there  is  no  measurement  data,  so  :: 
and  P  progate  time  forward  according  to  the  continuous  filter  with  the 
initial  conditions 


xti  ^+1'  Pti  Pk+1 ' 

Time  integration  of  x  and  P  at  t,  te  [t.  ,tk+, ]  becomes 
xt  =  f(xt,at,t)  +  bp 
Pt  =  F(xt.t)Pt  +  PtFT(xt,t)  +  gJouTg;  , 
where  f(.)  is  the  system  function  with  an  extra  parameter  a^ 


(4-11) 


(4-12) 


and 


F(xt.t)  = 


3  f 
3  x 


x=x» 


Bias  correction  term  bp  is  an  n-vector  with  i-th  component 


bpl(t)  =  1/2  tr  { 


3 


3x3  x 


T  ‘t 


(4-13) 


x=x* 


and  for  system  noise  function  G^x^) 


3  Gik  3GLj 


)  }  + 


(GtQ(t)Gpij=  Z  [Gi^kiC^  tr  {  ( - Qkl 

k,  1=1  9x  3x 


1/2  w(Sv +  1/2  tr{  hS1  v*j  J ,  «-*«> 


s  is  a  dimension  of  system  noise. 


Thus  at  t 


the  initial  condition  of  the  first  stage 


=  t, 


k+1 ' 


becomes ,  again 


=  v  \ -  pt- 

and  the  same  procedures  repeat  for  new  observed  data. 


(4-15) 


3.  Unobserved  system  covariance 

Another  covariance  Ft  is  required  to  compute  This  is 
evaluated  according  to  equation  (4-12).  Since  no  measurement  is  made 
here  the  measurement  update  is  not  necessary.  Of  course  the  reference 
point  should  be  different  with  (4-12)  except  at  the  initial 
conditions. 

With  assigned  parameters  and  algorithms,  simulation  is  conducted 
for  three  different  coordinates.  The  following  are  the  results  found 
from  the  analysis  of  the  simulation  for  the  first  40  minutes. 

Tables  5,  6,  7  show  the  mutual  information  contents  of  the  three 
coordinate  systems  with  various  parameter  changing-system  noise  Q(t) 


and  maneuvering  a^.  Total  system  observability  is  most  strong  in  the 


modified  polar  coordinates  (Table  6).  Rectangular  and  mixed 
coordinates  show  almost  the  same  levels  (Table  5  and  7) .  Of  course, 
directly  observed  variables  -  bearing (8)  in  mixed  and  MP,  range  (r)  in 
rectangular  -  exhibit  the  strongest  observability  in  all  cases. 
Inspection  of  all  three  tables  show  that  system  observability  drops 


significantly  as  the  maneuvering  parameter  changes  from  a  ±  0 

y 


(maneuvering  exists)  to  ay  =  0  ( non-maneuver ing) .  This  can  be 


explained  best  by  the  deterministic  observability.  As  seen  in  Cliapter 


Two,  the  sytem  is  observable  determinirtically  only  when  manuevering 
exists. 

Another  notable  observability  decrease  appears  when  there  exists 
system  noise  (Q  5*  0) .  This  is  due  to  a  change  of  mutual  information 
quantity  from  IQ(xt,yt)  to  Ifx^y^)  (See  Chapter  Three  for  notation). 

Notice  also  that  range  information  is  most  drastically  influenced 

by  the  observer  maneuvering  (3.8  to  0.2  for  mixed,  7.4  to  -3.0  for 

modified  polar,  4.3  to  0.9  (r  )  and  5.0  to  1.8  (r  )  for  rectangular 

x  y 

coordinates,  respectively).  In  spite  of  the  strongest  total 
observbility,  contribution  by  the  range  observability  to  the  total 
observability  is  the  most  negligible  in  the  MP  case. 

Velocity  observability  remains  very  poor,  generally,  in  the  non¬ 
maneuvering  case,  or  when  system  noise  exists. 

The  effects  of  the  degree  of  observability  on  the  range  and 
velocity  estimation  error  are  shewn  in  Figures  10  to  13.  Range  errors 
(Figures  10  to  12)  converge  toward  zero  for  the  maneuvering  and 
without  system  noise  case  (even  different  convergence  rates),  but  not 
for  other  cases.  For  all  three  coordinates,  range  errors  seems  to 
diverge  when  ay  =  0  and  Q  =  0.  At  least ,  they  do  not  converge  to 
zero  in  the  non-maneuvering  case  in  any  sense. 

Relative  poor  observability  of  the  range  variable  in  the  MP 
system  may  be  the  reason  why  the  range  error  exhibits  sane  oscillatory 
property  in  Figure  11. 
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A  more  desirable  convergence  is  shown  by  the  mixed  coordinates  if 
a  ?  0,  Q  =  0  (Figure  10).  Note  that  the  vertical  scale  in  the  MP 
system  is  different  than  the  other  two  coordinates. 

Careful  comparison  of  the  observability  tables  and  corresponding 
estimation  error  figures  shows  that  they  are  very  closely  related, 
i.e.,  the  fast  information  growth  interval  corresponds  to  the  abrupt 
error  descreasing  interval.  Figure  13  shows  that  velocity  errors 
converge  to  zero  nicely  for  both  mixed  and  rectangular  coordinates 
when  maneuvering  exists.  This  may  be  due  to  the  strong  observability 
of  these  variables.  Note  that  initial  velocity  error  (1  m/s  ~  2kts) 
does  not  decrease  satisfactorily  when  a^  =  0  for  both  coordinates. 
The  velocity  variables  are  not  available  exclusively  for  the  MP 


coordinates . 


able  6,  Observability  (  Effects  of  Q  and 


'able  7,  Observability  (Effects  of  Q  and  a  )  i  Rec 


10  sec 


I'  l(>10  OHS.  &  RANCH  RRROR(MIXKD) 


TIMK(MIN) 


ON- 


I'’l(;i3  01ts.  k  VELOCITY  KKROR 


Observability  analysis  in  terms  of  measurement  noise  R  is  shewn 

in  Tables  8,  9  and  10  with  corresponding  range  error  (e  ).  As 

r 

expected,  observability  decreases  as  the  noise  level  increases. 
Particularily  observability  of  the  target  speed  becomes  very  poor  when 
high  noise  level  is  presented. 

Comparing  the  range  errors  for  all  coordinates  shows  that  mixed 
coordinates  exhibit  the  smallest  errors  even  with  the  high  noise 
level.  This  may  be  an  extremely  important  characteristic  from 
practical  point  of  view. 

Due  to  the  fast  information  pick-up  in  the  early  stage,  range 

error  drops  very  quickly  for  the  mixed  coordinates.  For  example, 

within  one  minute,  e  drops  around  10%  of  its  initial  error  and  stays 

r 

within  that  value  in  low  noise  (R=(0.2  )  ).  However,  the  rectangular 
coordinate  case  takes  five  minutes  and  the  MP  takes  more  than  twenty 
minutes.  Even  though  the  system  observability  is  high  in  the  MP 
coordinates,  range  error  shows  quite  unstable  behavior.  This  trend 
takes  longer  as  the  noise  level  becomes  higher  (Table  9) .  Analysis 
shews  that  the  Instability  is  due  to  the  to  the  instability  in  P^. 

Table  11  shows  the  effect  of  the  data  sampling  interval  for  the 
mixed  coordinates.  From  a  standard  10-second  interval,  it  is  extended 
to  20  seconds  or  is  shortened  to  2  seconds.  More  frequent  measurement 
(shorter  T)  makes  the  system  more  ovbservable.  Specifically, 
observability  of  the  speed  variable  in  the  maneuvering  direction  (y- 

direction  here)  improved  significantly. 


However,  due  to  data  processing  speed  limitation  of  the  on-board 
processor  as  well  as  other  limitations,  ping  interval  for  active  SONAR 
or  randan  process  correlation  time,  for  example,  one  cannot 

practically  decrease  the  sampling  interval  arbitrarily  in  the  underwater 
tracking  system. 

One  more  point  which  has  not  appeared  here  is  the  effect  of  the 
magnitude  of  the  maneuvering.  Sensitivity  analysis  shows  that  once 
maneuvering  exists  its  magnitude  does  not  give  any  significant 
influence  to  the  information  content.  This  also  may  be  a  very 
valuable  finding  fran  the  economic  and  tact i cad.  standpoint . 


Table  8,  Effects  of  measurement  noise 


Table  9,  Effects  of  measurement  noise 


Table  10 ,  Effects  of  measurement  noise 


Table  11,  Effects  of  sampling  interval  T  (Mixed, 


4-3.  Information  and  Sensor  number,  Measurement 
Policy  in  Array  SONAR  Tracking 

Another  application  area  where  system  observability  is  crucially 
important  in  the  ocean  environment  is  the  underwater  SONAR  tracking 
problem.  Here,  one  is  interested  in  determination  of  the  number  of 
sensors  and  their  deployment  configuration  such  that  the  system  is 
deterministically  observable  as  well  as  stochastically  more  strongly 
observable.  One  also  wants  to  decide  what  kind  of  quantity  should  be 
measured  to  maximize  the  collected  information  with  the  given 
conditions.  The  last  point  is  more  important  for  our  purpose  here 
since  even  with  the  same  number  of  sensors  and  with  the  same 
deployment  structure,  measurement  of  different  quantities  results  with 
different  degrees  of  observability. 

We  have  already  analyzed  the  same  problem  from  the 
deterrainisitic  point  of  view  in  Chapter  Two.  We  observed  that  the 
system  is  observable  except  when  we  measured  one  absolute  time  delay 
with  one  sensor  (lSlabs.D).  The  other  cases  are  all  observable  at 
least  in  a  wide  sense .  See  Figure  3  for  the  sensor- target 

configuration.  We  observed,  also,  that  Doppler  measurement  increases 
the  measurement  quantity  with  a  factor  of  fc  (carrier  frequency) 
compared  to  the  delay  measurement. 

Here  the  same  problem  is  analyzed  stochastically.  Seven 

measurement  policies  are  chosen  as  in  Chapter  Two  for  the  linearily 
deployed  sensors.  The  standard  extended  Kalman  filter  of  the  discrete 


The  other  parameters  used  are  as  follows: 


measurement  sampling  interval;  T  =  15  sec. 

initial  condition  of  x  (when  no  initial  noise  is  added) 


rx(°) 

v  (0) 
x'  ' 

r  (0) 

y 

V0) 

cx(0) 
^2  ( ° ) 


10000  m 

-15.433  m/s  (  "30  knots,  approaching) 
4000  m 
0  m/s 
1500  m/s 
,1500  m/s 


where  x..(0)  is  assumed  N  (x^O),  c.),  i  =  1,  ...  6,  such  that 


°Vx  “ 


aY  = 
aVy  = 
°C1  = 


°C2 


100  m  , 

0.15  m/s  , 

40  m  , 

0 . 1  m/s  , 

5  m/s  , 

5  m/s  . 


The  measurement  noise  assumed  is  also  a  Gaussian  sequence  with 
covariance 


QTl2  = 

0.019 

sec  , 

°T23  = 

0.026 

sec  , 

°T13  = 

0.016 

sec  , 

aabsD~ 

0.359 

sec  , 

0.1875 

Hz  , 
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and  P  =  r  is  assumed  to  be 
o  o 

<£(0)  x  104 
OyJjO)  x  5xl02 
P  =  a2  CO)  x  104 

a^CO)  x  5x1 02 
acl(0)  x  102 

°C2  ( 0 )  X  102 

f  =  3500  Hz  (modulation  carrier  frequency)  , 

Z  =  2000  m  (intersensor  distance  of  and  s3), 

Z2  =  2000  m  (intersensor  distance  of  s^ ,  and  s2). 

With  the  above  parameter  20  runs  are  averaged.  Table  12  shows 
the  mutual  information  content  for  the  whole  system  for  various 
measurement  schemes.  Clearly  an  increased  number  of  deployed  sensors 
yields  stronger  observability.  In  spite  of  the  largest  observation 
magnitude  (notice  that  absolute  delay  is  much  larger  than  relative 
delay  magnitude  for  far-field  observation)  lSlabs.D  system  shows  the 
weakest  observability  due  to  the  unobservable  state  x5  (=c^) . 

Inspection  of  the  table  shows  also  that  the  degree  of  the 
observability  can  be  approximately  categorized  in  three  groups. 


1.  lSlabs.D  (Obs.  =9.2) 

2S1D  (10.5) 

2S1P  (13.4) 


2. 


2S1D1P 


(20.6) 


3S2D 

(19.7) 

3S3D 

(21.2) 

3S2D1P 

(30.8) 

When  only  delay  or  Doppler  is  measured  for  one  or  two  sensors, 
the  system  still  remains  in  a  weakly  observable  status  even  when  the 
system  is  deterministically  observable  (the  first  group). 

Stronger  information  is  obtained  when  measuring  more  than  one 
quantity,  i.e.,  both  delay  and  Doppler  with  two  sensors  (2S1D1P),  or 
when  one  more  sensor  is  added  to  the  measurement  of  only  one  quantity 
(3S21D,  3S3D)  (the  second  group). 

Information  does  not  increase,  appreciably,  with  the  addition  of 
the  same  kind  of  measurement  quantity  as  can  be  seen.  This  may  be 
caused  by  the  fact  that  the  third  delay  depends  entirel  on  the  first 
two  delays.  Only  two  delays  sure  independent  in  the  three-sensor  delay 
measurement. 

Stronger  and  more  significant  information  is  obtained  when  one 
observes  both  delay  and  Doppler  with  three  sensors  (the  last  case) . 

It  is  also  of  interest  that  most  of  the  information  is  collected 
during  the  very  early  stages  of  the  observation,  i.e.,  when  the  first 
few  sets  of  measurement  data  sus  processed. 

Information  content  for  the  individual  measurement  policies  is 
shewn  in  Table  13  through  19.  In  the  case  of  lSlabs.D  (Table  13) 
mutual  information  about  c,  is  zero  due  to  the  unobservability  of  this 


variable.  Observability  of  v  ,  v  and  c  is  relatively  poor  compared 

x  y  < 

to  the  range  variables  r  and  r  . 

x  y 

Here  one  can  easily  understand  the  obvious  advantage  of  the 
mutual  information  approach  (in  Shannon's  sense)  compared  to  the 
Fisher  information  matrix  approach.  In  the  current  method,  the 
information  content  of  the  deterministically  observable  individual 
state  estimate  is  calculated  as  well  as  the  total  system  information 
even  if  some  states  are  unobservable.  This  is  not  possible  in  the 


Fisher  information  approach  when  the  information  matrix  is  singular 

(  Compare  Table  12  and  Table  20  ). 


Table  12,  System  observability  of  array  SONAR 


.•V 


Table  13,  Observability  :  lSlabs.D 


t 

Total 

r 

X 

vx 

r 

y 

V 

y 

C1 

c2 

(rain) 

0.25 

4.35 

3.11 

0.00 

0.03 

0.00 

0.0 

0.01 

0.50 

5.03 

3.27 

0.00 

0.03 

0.00 

0.0 

0.01 

0.75 

5.42 

3.32 

0.00 

0.03 

0.00 

0.0 

0.01 

1.00 

5.70 

3.33 

0.01 

0.04 

0.00 

0.0 

0.01 

1.25 

5.92 

3.34 

0.01 

0.06 

0.00 

0.0 

0.01 

1.50 

6.14 

3.35 

0.02 

0.13 

0.00 

0.0 

0.01 

1.75 

6.48 

3.44 

0.03 

0.36 

0.00 

0.0 

0.01 

2.00 

6.99 

3.65 

0.04 

0.78 

0.01 

0.0 

0.02 

2.25 

7.57 

3.94 

0.04 

1.27 

0.01 

0.0 

0.02 

2.50 

8.12 

4.19 

0.04 

1.73 

0.01 

0.0 

0.03 

2.75 

8.52 

4.33 

0.04 

2.04 

0.01 

0.0 

0.03 

3.00 

8.64 

4.29 

0.05 

2.10 

0.01 

0.0 

0.03 

3.25 

8.72 

4.26 

0.06 

2.13 

0.02 

0.0 

0.03 

3.50 

8.80 

4.22 

0.08 

2.16 

0.02 

0.0 

0.04 

3.75 

8.87 

4.19 

0.10 

2.19 

0.03 

0.0 

0.04 

4.00 

8.94 

4.15 

0.12 

2.21 

0.04 

0.0 

0.04 

4.25 

9.00 

4.11 

0.13 

2.23 

0.05 

0.0 

0.04 

4.50 

9.06 

4.07 

0.15 

2.26 

0.05 

0.0 

0.05 

4.75 

9.12 

4.03 

0.16 

2.28 

0.06 

0.0 

0.05 

5.00 

9.17 

3.99 

0.16 

2.31 

0.08 

0.0 

0.05 

m 


i  i 

iW- 


*  < 


Cj 


i  ‘ 
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Table  15,  Observability  2S1P 


»-V->4 


i.. 


t 

Total 

r 

X 

V 

X 

r 

y 

V 

y 

C1 

C2 

I 

m 

0.25 

6.65 

0.59 

0.03 

0.15 

0.00 

0.23 

0.20 

Vi 

0.50 

7.22 

0.84 

0.05 

00.16 

0.00 

0.26 

0.25 

y’;Sj 

0.75 

7.94 

1.28 

0.11 

0.18 

0.01 

0.39 

0.34 

V-  ^ 

/■•'A* 

1.00 

8.57 

1.67 

0.15 

0.21 

0.01 

0.45 

0.37 

vi 

1.25 

9.12 

1.82 

0.20 

0.27 

0.02 

0.49 

0.40 

VvV 

1.50 

9.47 

1.98 

0.24 

0.34 

0.03 

0.50 

0.41 

1.75 

9.81 

2.13 

0.27 

0.47 

0.03 

0.50 

0.41 

2.00 

10.09 

2.24 

0.31 

0.64 

0.04 

0.53 

C .  43 

2.25 

10.32 

2.34 

0.36 

0.81 

0.05 

0.53 

0.43 

E 

2.50 

10.56 

2.46 

0.40 

0.90 

0.05 

0.55 

0.44 

2.75 

10.81 

2.56 

0.44 

0.97 

0.06 

0.56 

0.45 

Xv  v 

3.00 

11.25 

2.70 

0.48 

00 

r-« 

H 

0.07 

0.56 

0.46 

3.25 

11.45 

2.79 

0.51 

1.29 

0.07 

0.56 

0.46 

*  V-  .*» 

3.50 

11.69 

2.90 

0.54 

1.36 

0.08 

0.56 

0.46 

*  ’•  V 

3.75 

12.12 

3.02 

0.59 

1.48 

0.09 

0.56 

0.46 

•’  »'  -N 

4.00 

12.36 

3.15 

0.63 

1.55 

0.10 

0.56 

0.46 

4.25 

12.63 

3.25 

0.67 

1.63 

0.11 

0.56 

0.46 

,*  ■  fc*>  * 

4.50 

12.89 

3.32 

0.72 

1.70 

0.13 

0.56 

0.46 

^  i 

4.75 

13.11 

3.39 

0.78 

1.73 

0.14 

0.56 

0.46 

5.00 

13.44 

3.48 

0.84 

1.31 

0.15 

0.56 

0.46 

.**'.1 

V*\1 

*  n*  i 


.  *  *  *  4  "  « 


‘  *  •■«--•  ^  "-»  --■*  ±~'  ?■  «  "->  -*--•  '  ‘  -  *  ’  *'  h  -*-***1-*  •*-*  Vh  *-■  *.  *  \.w  **  ■»  ■"-*  "■*N>  *  ■  *  1  .1  ■* 
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Table  16,  Observability  :  2S1D1P 


t 

Total 

r 

X 

V 

X 

r 

Y 

V 

Y 

C1 

c2 

ft? 

iliSSn 

0.25 

12.31 

0.93 

0.03 

0.55 

0.00 

0.53 

0.65 

0.50 

13.52 

1.19 

0.07 

0.79 

0.02 

0.64 

0.65 

0.75 

14.57 

1.47 

0.21 

1.12 

0.06 

0.65 

0.67 

i.OO 

15.53 

1.78 

0.32 

1.47 

0.14 

0.66 

0.68 

1.25 

16.40 

2.01 

0.46 

1.71 

0.25 

0.66 

0.69 

1.50 

16.91 

2.13 

0.52 

1.82 

1.82 

0.36 

0.70 

m  r  ‘ 

1.75 

17.34 

2.25 

0.58 

1.88 

0.46 

0.66 

0.71 

1  -  *"  v" 

2.00 

17.75 

2.37 

0.65 

1.95 

0.53 

0.67 

0.71 

•  -  *« 

’  «.■*•*  * 

2.25 

18.15 

2.48 

0.73 

2.00 

0.57 

0.67 

0.72 

.  *  «.  * 

p* 

2.50 

18.41 

2.55 

0.75 

2.00 

0.69 

0.67 

0.72 

2.75 

18.65 

2.62 

0.79 

1.99 

0.62 

0.67 

0.72 

*  >  *  * 

.  V  - 

*  - 

3.00 

18.90 

2.70 

0.82 

2.00 

0.65 

0.67 

0.72 

£ 

3.25 

19.20 

2.81 

0.83 

2.07 

0.73 

0.67 

0.73 

3.50 

19.40 

2.90 

0.83 

2.08 

0.77 

0.67 

0.72 

3.75 

19.58 

2.98 

0.84 

2.09 

9.80 

0.66 

0.72 

4.00 

19.70 

3.09 

0.65 

2.12 

0.84 

0.66 

0.72 

r* — 

4.25 

19.99 

3.20 

0.86 

2.15 

0.89 

0.66 

0.72 

V.-; 

4.50 

20.18 

3.30 

0.87 

2.18 

0.93 

0.66 

0.72 

r’ 

4.75 

20.39 

3.42 

0.90 

2.22 

1.00 

0.66 

0.72 

*  V. 

5.00 

20.64 

3.53 

0.94 

2.27 

1.09 

0.66 

0.72 

•  •  .  *  1 
.  V*  -N 

*•  .N 


Table  19,  Observability  :  3S2D1P 


t 

Total 

rx 

Vx 

rY 

V 

y 

C1 

c2 

0.25 

21.74 

1.73 

0.06 

1.46 

0.00 

1.48 

0.84 

0.50 

22.86 

1.93 

0.12 

1.64 

0.18 

1.61 

0.94 

0.75 

23.82 

2.13 

0.19 

1.87 

0.39 

1.81 

1.16 

1.00 

24.70 

2.41 

0.25 

2.21 

0.59 

2.05 

1.38 

1.25 

25.50 

2.63 

0.35 

2.47 

0.77 

2.25 

1.64 

1.50 

26.09 

2.78 

0.44 

2.62 

0.94 

2.35 

1.78 

1.75 

26.58 

2.94 

0.50 

2.77 

1.10 

2.42 

1.84 

2.00 

27.07 

3.08 

0.59 

2.89 

1.23 

2.46 

1.98 

2.25 

27.43 

3.19 

0.67 

2.97 

1.34 

2.46 

1.98 

2.50 

27.84 

3.29 

0.74 

3.06 

1.44 

2.47 

2.0C 

2.75 

28.15 

3.39 

0.81 

3.14 

1.54 

2.45 

2.01 

3.00 

28.49 

3.51 

0.85 

3.24 

1.63 

2.45 

2.02 

3.25 

28.81 

3.64 

0.87 

3.36 
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some  unobservable  states.  In  this  case  the  total  or  individual  state 
information  cannot  be  computed  because  of  this  singularity.  In  that 
case  even  identification  of  unobservable  states  is  not  generally 
possible.  To  identify  those  unobservable  states  usually  intuition, 
experience  or  trial  and  error  are  used.  All  the  other  six  measurement 
policies  (Tables  14  to  19)  show  that  the  system  is  observable  even 
though  the  degree  of  observability  is  different.  As  the  number  of 
sensors  increases  unobservable  or  weakly  observable  states  become  mere 
strongly  observable.  Specifically,  information  growth  for  the  sound 
speed  variables  c  and  c 2  is  significant  when  the  three-sensor  policy 
is  used  regardless  of  measured  quantity. 

Strong  system  observability  for  the  3S2D1P  case  is  due  to  the 
strong  individual  state  information  for  all  six  states. 

The  effects  of  filtering  errors  due  to  the  different  degrees  of 
information  content  is  seen  from  Figure  14  through  16  for  range  r„, 
target  speed  v  ,  and  sound  speed  c^,  respectively. 

Roughly,  increasing  the  number  of  measured  quantities  with  more 
sensors  gives  a  smaller  filtering  error  because  of  the  stronger 


observability.  With  an  initially  given  1,000  m  range  error,  combine 
the  measurement  of  delay  and  Doppler  yields  significantly  small 
errors.  The  errors  stay  within  few  ten  meters  in  5  minutes  final  time 
for  both  2S1D1P  and  3S2D1P  cases.  3S2D1P  case,  particularly,  shews 
very  desirable  characteristics  as  can  be  seen  from  Figure  14.  It  is 


important  to  note  here  that  very  undesirable  properties  ( in  the  sense 
that  large  error  or  oscillation  of  range  error  results)  appear  when 


measuring  only  time  delay.  The  same  figure,  also,  shews  large  (more 
than  initial  error)  errors  in  the  case  of  2S1D,  "SOD,  and  some 
overshoot  appears  for  1S1D  even  with  reasonably  goal  range 
information.  Notice  that  1S1D  has  only  limited  usage,  e.g.,  target¬ 
sensor  synchronization  or  in  case  of  active  SONAR  situation. 

One  now  can  say  that  Doppler  measurement  which  is  combined  with 
proper  delay  measurement  is  crucial  for  good  range  estimation  in  SONAR 
tracking . 

Figure  15  shows  target  velocity  error  with  an  initial  2  m/s  (  ~4 
knots)  error.  Here  one  can  observe  some  different  aspects  as  compared 
with  the  range  error.  I.e.,  no  matter  what  quantity  is  measured,  the 
system  exerts  less  velocity  error  when  more  sensors  are  included  with 
increased  number  of  measured  variables.  Figure  15,  also,  shows  that 
the  magnitude  of  this  error  can  be  divided  in  three  groups, exactly,  as 
the  total  system  observability  is  divided.  The  first  group  (1S1D, 
2S1JD,  2S1P)  again  shows  the  poorest  performance  and  the  third  group 
(3S2D1P)  is  the  superior  group. 

1S1D  shows  some  oscillatory  properties  here,  also.  Extended 
observation  beyond  five  minutes  showed  that  the  error  in  231D1P  case 
decreases  from  around  51/2  minutes. 

Figure  16  shows  the  evolution  of  the  sound  speed  error  for  .an 
initially  given  50  m/s.  This  value  may  be  slightly  larger  than  the 
practical  situation. 

However,  one  can  easily  recognize  three  distinct  groups  of  error 
trends.  These  three  groups  exactly  coincide  with  the  groups  which  are 


made  in  the  system  observability.  I.e.,  IS ID,  2S1D,  2S1P  group  shows 
the  poorest  performance  and  2S1D1P,  3S2D,  3S3D  group  shows  the  medium 

range  error  and,  again,  3S2D1P  shows  the  superior  performance.  1S1D 
case  shows  a  mild  overshoot  with  the  weakest  information. 

For  comparison,  a  discrete  version  of  the  Fisher  information 
matrix  (3-65)  is  computed  for  the  selected  five  observation  policies. 

I(k,l)  =  Z  $_T(k,i)HT(x(i))R~J  H(x(i) )  (k,i),  (4-16) 

i=l  1 

Here,  iterative  modification  of  (3-67)  , 

I(k+l,l)  =  $_T  (k+l ,k)  I(k,  1)  $_1  (k+l,k)  + 

HT(x(k+l) )  R_1(k+1)  H(x(k+1) ) ,  (4-17) 

is  used  instead  of  (4-16).  This  is  shown  at  Table  20.  Matrix  I(k, 1) 
remains  singular  over  the  entire  observation  period  for  the  1S1D 
measurement  case  and  remains  nonsingular  with  shown  magnitude  of 
determinant  in  other  cases. 

Comparison  of  this  table  with  the  total  information  contents 
(Table  12)  will  reveal  that  the  two  approaches  exactly  correspond  to 
each  other  for  the  chosen  five  measured  policies. 

Superiority  of  the  measurement  3S2D1P  system  is  shown  here,  also, 
Thus  one  can  conclude  this  section  as  follows:  at  least  two  sensors 
are  required  for  the  system  to  be  observable.  3S2D1P  measurement 
gives  the  most  desirable  performance  in  all  cases.  If  only  two 
sensors  are  available,  a  combination  of  delay  and  Doppler  (2S1D1P) 


measurement  is  strongly  recommended.  For  small  range  error,  Doppler 
measurement  is  crucial.  For  small  target  velocity  and  sound  speed 
errors,  include  as  many  sensors  as  possible  to  make  strong  system 
observability. 

1S1D  policy  is  not  recommended  except  in  special  cases  as  in  the 
experimentally  well  synchronized  case  [64]  or  in  an  active  SONAR 
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Table  20 , 

Observability  (singularity  of  Fisher  information 
matrix) 


*  * — ►  implies  unobservable  period 


CHAPTER  5 :  SUNMARY  AND  CONCLUSION 


In  this  dissertation,  system  observability  is  studied  for  both 
deterministic  and  stochastic  cases. 

Since  nonlinear  observability  for  deterministic  systems  is  a 
geometric  nonlinear  functional  property,  the  inverse  and  implicit 
function  theorems  are  useful.  By  modifying  the  global  implicit- 
function  theorem,  sufficient  conditions  for  the  given  nonlinear 
function  to  be  globally  homeomorphic  are  derived.  From  an 
applicational  point  of  view,  the  nonzero  Jacobian  condition,  which  can 
be  related  to  n-1  dimensions  for  the  special  case,  provides  the 
connectedness  condition  for  every  state  to  be  connected  to  the 
measurement  space.  However,  a  finite-covering  condition  must  be 
tightened  to  a  one-covering  condition  then  by  which  univalence  of  the 
connectedness  can  be  guaranteed. 

Before  these  two  conditions  can  be  applied  to  the  system 
equations,  differentiation  of  the  system  observation  equations  with 
respect  to  t  ,  and  substitution  of  the  lower- order  derivatives  of 
observation  equations  to  the  higher  order  up  to  (n-l)-th  derivatives 
must  be  preceded. 

Depending  on  the  satisfication  of  the  conditions,  observability 
in  the  strict  sense,  observability  in  the  wide  sense,  and  unobservable 
states  are  determined. 


Application  of  this  method  is  demonstrated  by  severed  examples . 
Especially,  two  practiced  problems  in  the  ocean  environment  where  the 


observability  is  very  important  are  dealt  with  here. 

Bearing-only- target  tracking  problem,  which  is  described  by,  a 
so-called,  mixed-coordinate  system,  is  analyzed.  Hie  fact  that  this 
system  is  only  observable  when  relative  maneuvering  exists  and 
unobservable  when  non-maneuvering  is  proven  again,  and  special  cases 
of  interest  are  studied.  In  the  linear-array  SONAR  problem,  at  least 
two  sensors  are  necessary  for  system  observability.  Doppler 
measurement  scales  up  the  delay  measurement  quantity  by  the  factor  of 
modulation  carrier  frequency. 

For  stochastic  system  observability,  a  new  approach  is 
attempted.  Instead  of  using  the  classical  Fisher  information  matrix, 
mutual  information  (in  the  Shannon  sense)  is  computed  and  utilized  as 
a  criterion  for  determination  of  the  degree  of  the  observability. 
Computed  here  is  the  amount  of  information  about  one  random  process 
(state  xt)  contained  in  another  random  process  (observation  yt) . 
Since  the  mutual  information  is  defined  as  the  uncertainty  or  entropy 
difference  between  the  sender  and  the  receiver  of  the  information, 
from  information  theory,  it  is  required  to  know  two  entropies  H(x^) 
and  H(xt|yt). 

Fortunately,  entropy  and  variance  have  one-to-one  relationship 
except  in  a  few  special  cases.  So,  mutual  information  can  be  computed 
from  two  covariances  -  a  priori  and  a  posteriori  statistical 
covariances  -  as  far  as  both  are  available.  Since,  in  practice, 


%  A  . 


152 


higher  moments  are  required  in  the  evaluation  of  the  second  moment  of 
the  density  approximation  must  be  used  to  obtain  it.  Any  well- 
developed  approximated  nonlinear  filter  algorithm  can  be  used.  Since, 
covariance  in  this  case  does  not  fully  characterize  the  statistics  of 
the  state  xt  mutual  information  I  (x^.y^)  is  used  instead  of  . 

The  relationship  between  the  deterministic  observability  rank 
condition  and  the  stochastic  observability  in  terms  of  mutual 
information  is  discussed  for  the  linear  system. 

Obvious  advantages  of  the  mutual  information  approach  over  the 
Fisher  information  approach  in  connection  with  practical  application 
aspects  are: 

1)  System  observability  computation  is  possible  even  in  the  case 
that  some  states  are  unobservable.  This  is  not  possible  in  the 
Fisher  information  due  to  the  singularity  of  the  observability 
matrix. 

2)  Identification  of  unobservable  states  is  immediate  by  just 
indicating  the  states  whose  information  do  not  grow.  But  this  is 
very  difficult  in  the  Fisher  information  where  they  can  only  be 
done  by  empirical  guessing  or  trial  and  error  [71]. 

3)  Both  mutual  information  and  Fisher  information  consider  both 
system  and  measurement  noise  effects,  theoretically.  But  the 
Fisher  information  matrix  in  the  applicational  form  only 
accomodates  measurement  noise. 

4)  The  Fisher  information  matrix  for  the  nonlinear  system, 

traditionally,  uses  the  first-order  linearization.  But  Shannon 


Modified  polar  coordinates  show  some  unstable  characteristics  in 
spite  of  its  strong  observability  magnitude. 

At  least  two  sensors  are  required  for  every  state  to  be 
observable  in  the  array  SONAR  tracking  problem. 

3S2D1P  measurement  policy  is  the  most  recomnendable  if  up  to 
three  sensor  deployment  is  available. 

If  only  two  sensors  sure  available,  a  combination  of  delay  and 
Doppler  (2S1D1P)  measurement  is  the  most  recomnendable  policy.  For 
only  small  range,  Doppler  measurement  is  crucially  important.  On  the 
other  hand,  for  small  target  velocity  and  sound-speed  errors,  include 
as  many  sensors  as  possible  to  make  the  system  more  strongly 
observable  since  those  errors  are  porportional  to  the  whole  system 
observability. 

1S1D  is  not  recomnendable  except  for  particulary  well- 
synchronized  experimental  cases. 

As  a  result,  for  the  deterministic  observability  problem,  two 
simple  and  convenient  conditions  -  connectedness  and  univalence  -  are 
developed. 

For  stochastic  observability,  it  is  found  that  the  mutual 
information  approach  is  a  valid  alternative  which  seemingly  can 
determine  the  degree  of  observability  more  completely  than  the 
classical  Fisher  information  matrix. 

The  effect  of  the  deterministic  observability  to  the  stochastic 
observability  and  related  topics  are  analyzed  for  the  BOT  and  array 
SONAR  tracking  simulation. 
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APPENDIX  A:  FUNCTIONS  AND  FUNCTIONAL  DEPENDENCE  [24] ,[31] 


Definitions 

Consider  an  n  real  valued  continuous  function  f:  Rn->Rn. 

Function  f :  X->Y,  XeRn,  YeRn  is  one-to-one  from  X  into  Y  if  for  every 

yER(f),  range  of  f,  there  is  exactly  one  jeX  such  that  y=f (x) .  Then 
f  has  a  left-inverse  g  if  and  only  if  f  is  one-to-one,  i.e.,  there 
exists  a  function  g:  Y->X  such  that 

gof  =  g(f(X))  =  Ix,  (A-l) 

where  Iv  is  the  identity  function  for  X.  If  every  yeY  is  the  image  of 

A 

at  least  one  xtX,  then  f  is  an  onto  function.  In  this  case  f  has  a 
right-inverse  g  such  that 

fog  =  f (g(Y) )  =  Iy.  (A-2) 

If  f  is  one-to-one  and  onto,  then  it  is  said  to  be  an  one-to  - 
one  correspondence.  A  f  is  invertible  if  and  only  if  it  is  one-to-one 
correspondence,  and  thereby  has  a  left  and  right  inverses  which  are 
equal.  A  function  f  is  a  homeomorphism  if  it  is  one-to-one 
correspondence  and  has  continuous  Inverse  f  1.  Further,  if  f  is 
continuously  differentiable,  i.e.,  C1  function,  then  f  is  called 
di f  f eomorphism .  C-  diffeomorphism  means  that  the  inverse  f  1  exists 
arid  is  also  of  class  C1  .  So,  the  invertability  property  of  f  can  be 
diagrammed  as  follows: 


yes 

* 

f  has  unique  inverse 
(f  invertible) 

- is  f  onto? 

yes 

no 

f  has  only  left  inverses 

r 

f  has  only  right  inverses 

no 

yes 

- >  is  f  onto? 

no 

f  has  no  inverse 

Functional  dependence 

Real  valued  n  function  f  =  . . . .  ,fn)T  is  functionally 

dependent  in  an  open  subset  G  of  X  if  there  exists  a  function  i|>  from 
Rn  to  R1  such  that 

(frf2 . ,fn)  =  0  for  XeG  (A-3) 

Now  introduce  following  theorems.  Proof  can  be  found  in  many 
standard  analysis  text,  for  example  [21],  [24],  [26]. 

Theorem  A-l  r  261 

Let  X  be  a  subset  of  R11.  If  f:  X->Rn  is  C1  function  in  an  open 
set  GeX  and  the  Jacobian  J  of  f  is  not  identically  zero  for  J£G,  then 


f1<f2,...,fn  are  functionally  independent  in  G. 


Remarks 


DetJf (x)fO  for  some  x  implies  [15],  [32]  that  for  any  interior 

point  y  of  R(f)  there  exists  a  neighborhood  of  y  where  the  C1  inverse 

f  1  of  f  is  defined.  Further,  if  f  1  is  unique  globally,  then  f  is  a 
1  n 

C;  -dif feomorphism  and  f  maps  Rn  onto  itself  as  a  one-to-one 
correspondence.  If  f  1  is  not  unique  globally,  then  the  next  inverse 
function  theorem  may  be  used  to  restrict  the  domain  X  on  which  f  is 
one-to-one. 

Theorem  A- 2  Inverse  Function  theorem  f  26] 

Let  x  be  an  interior  point  of  a  set  X  in  Rn  and  suppose  the 
function  f:X->Y,  YeR11  satisfies  the  following: 

i)  f  is  a  class  C1. 

ii)  det  Jf(x)fO, 

then,  there  exists  an  open  set  U  containing  x  such  that  the 
restriction  of  f  to  U,  f ^  is  one-to-one.  The  inverse  f-1  is  also  C1 
on  the  open  set  V=f (U) . 

A  generalization  of  the  inverse  function  theorem  to  the  function 
of  the  form  f :RnxRr->Rm,  m  is  not  necessarily  equal  to  n,  is  the 
following  implicit  function  theorem. 

Theorem  A- 3  Implicit  Function  theorem  [21] 

Let  (x,v)T,  x£Rn  be  an  interior  point  of  a  set  E  in  RnxRr  and 
suppose  that  the  function  f:  E->Rn  satisfies  the  following  conditions 


ii)  f { . )  is  C  at  (x,v)  , 

iii)  detJf(x,v)fO 

Then,  there  exists  neighborhood  N,  R  of  x,  v  given  by 

N  =  [x-a]x[x+a] , 

R  =  [v-b]x[v+b] , 

where  a,  b  are  proper  real  constant  vectors,  and  a  C*  function  g:  R->N 
such  that 
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APPENDIX  B:  DETERMINATION  OF  THE  MAXIMUM  ENRTOPY  DENSITY 

Determination  of  the  maximum  entropy  density  function  is  derived 
next.  This  is  useful  in  the  computation  of  upper  bound  of  the  the 
information  contents  which  is  contained  in  the  arbitrary  random 
variable  or  random  process. 

Consider  a  scalar  random  variable  x  which  has  density  p(x),  but 
the  form  of  p(x)  is  not  known.  Then  from  (3-4), 

00 

H(x)  =  -/p(x)lnp(x)dx.  (B-l) 

—  <30 

One  wants  to  find  p(x)  which  maximizes  (B-l)  under  some  constraints. 
Since  maximum  entropy  density  function  p(x)  is  changed  as  the  range  of 
x  and  constraints  are  changed.  Suppose  first  that; 

maximize  (B-l)  with 
xe[0,a] , 
a 

/p(x)dx  =  1  , 
o 

then  by  the  help  of  the  calculus  of  variations,  one  can  compute 
maximum  entropy  density  p(x)  as 

p(x)  =  f  1/a,  0  <  x  <  a 

\  0,  elsewhere  (B-3) 


i.e.,  uniform  density  gives  the  maximum  entropy  in  this  case. 


But  if  the  range  of  x  is  change  to 
xe{ Op  ] , 

and  constraints  are  changed  to 

OO 

/  p(x)dx  =  1, 
o 

/xp(x)dx  =  E[x]  =  m,  (B-4) 

o 

then  the  result  is 

1  x 

p{x)  = - exp{ - >  ,  0  <  x  <  ®  ,  (B-5) 

m  m 

i.e.,  one-sided  exponential  density  yields  the  maximum  entropy 
density. 

More  generally,  if 

XS(  -oo,  oo)  , 

with  the  constraints 


OO 

/  p(x)dx  =  1, 
_00 

00 

/xp(x)dx  =  m, 

_  CO 


00  2  2 
/  (x-m)  p(x)di:  =  var  x  =o  , 


(B— 6) 


then  one  can  prove  the  following  important  result  [42],  [49],  [67]. 

Theorem  B-l 

For  x  £(-“,*)  with  the  constraint  (B-6) ,  the  maximum  entropy 
density  function  p(x)  is  a  Guassian  density. 


Proof 

Problem  is  to  show  that  the  solution  p(x)  which  maximizes  (E-l) , 

i.e. , 

00 

max.  {-/  p(x)lmp(x)dx} 

_  <X> 

with  the  given  range  and  constraints  have  the  form 


1  (x-m)2 

p(x)  - - x-  exp{ - x~)  •  (B'7) 

2-na  2a 


The  Lagrangian  M  for  this  problem  is 


M  =  -/p(x)lnp(x)dx  +  x [l-/p(x)dx] 


00  2  °°  2 
+  y  [m  -/  xp(x)dx]  +  6  [o  -  f  (x-m)  p(x)dx] , 


(S-8) 


vhere  A  ,  y  ,  0  are  Lagrangian  multipliers.  Using  calculus  of 


variations  with 


S<J>(p)  =  (d<J>/dp)5p, 


6M  =  -/{lnp(x)  +  1  +  X  +  yx  +  j3(x-m)2)dx.dp  13-3) 

By  setting  6M=0,  (B-9)  gives 

2 

lnp(x)  +  1  +  X  +  yx  +  B(x-m)  =  0,  (B-10) 

or 

p ( x)  =  exp{-l  -  X  -  y  X  -  $(x-ra)2}.  (B— 11 ) 

Substitution  of  (B-ll)  into  (B-6)  and  solving  for  X  .  y  ,  B  yields 
Guassian  density 


1  (x-m)2 

p(x)  = - 5  exp{ - t— }.  (B-12) 

2  no  2o 


So,  for  any  distribution  of  x  next  relation  holds 
H(x)  <  Hq(x), 

=  1/2  ln^neo2).  (B-13) 

X 

where  HG(x)  is  a  Guassian  entropy. 

Table  B-l  shows  H(x)  of  commonly  used  density  functions  for  fixed 
variance.  Note  that  the  Gaussian  density  has  the  largest 
entropy . 
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